Generate a number within a range and considering a mean val - random

I want to generate a random number within a range while considering a mean value.
I have a solution for generating the range:
turtles-own [age]
to setup
crt 2 [
get-age
]
end
to get-age
let min-age 65
let max-age 105
set age ( min-age + random ( max-age - min-age ) )
end
However, if I use this approach every number can be created with the same probability, which doesn't make much sense in this case as way more people are 65 than 105 years old.
Therefore, I want to include a mean value. I found random-normal but as I don't have a standard deviation and my values are not normally distributed, I can't use this approach.
Edit:
An example: I have two agent typologies. Agent typology 1 has the mean age 79 and the age range 67-90. Agent typology 2 has the mean age 77 and the age range 67-92.
If I implement the agent typologies in NetLogo as described above, I get for agent typlogy 1 the mean age 78 and for agent typology 2 the mean age 79. The reason for that is that for every age the exact same number of agents is generated. This gives me in the end the wrong result for my artificial population.
[Editor's note: Comment from asker added here.]
I want a distribution of values with most values for the min value and fewest values for the max value. However, the curve of the distribution is not necessarily negative linear. Therefore, I need the mean value. I need this approach because there is the possibility that one agent typology has the range for age 65 - 90 and the mean age 70 and another agent typology has the same age range but the mean age is 75. So the real age distribution for the agents would look different.

This is a maths problem rather than a NetLogo problem. You haven't worked out what you want your distribution to look like (lots of different curves can have the same min, max and mean). If you don't know what your curve looks like, it's pretty hard to code it in NetLogo.
However, let's take the simplest curve. This is two uniform distributions, one from the min to the mean and the other from the mean to the max. While it's not decreasing along the length, it will give you the min, max and mean that you want and the higher values will have lower probability as long as the mean is less than the midway point from min to max (as it is if your target is decreasing). The only question is what is the probability to select from each of the two uniform distributions.
If L is your min (low value), H is your max (high value) and M for mean, then you need to find the probability P to select from the lower range, with (1-P) for the upper range. But you know that the total probability of the lower range must equal the total probability of the upper range must equal 0.5 because you want to switch ranges at the mean and the mean must also be the mean of the combined distribution. Therefore, each rectangle is the same size. That is P(M-L) = (1-P)(H-M). Solving for P gets you:
P = (H-M) / (H - L)
Put it into a function:
to-report random-tworange [#min #max #mean]
let prob (#max - #mean) / (#max - #min)
ifelse random-float 1 < prob
[ report #min + random-float (#mean - #min) ]
[ report #mean + random-float (#max - #mean) ]
end
To test this, try different values in the following code:
to testme
let testvals []
let low 77
let high 85
let target 80
repeat 10000 [set testvals lput (random-tworange low high target) testvals]
print mean testvals
end
One other thing you should think about - how much does age matter? This is a design question. You only need to include things that change an agent's behaviour. If agents with age 70 make the same decisions as those with age 80, then all you really need is that the age is in this range and not the specific value.

Related

Design L1 and L2 distance functions to assess the similarity of bank customers. Each customer is characterized by the following attribute

I am having a hard time with the question below. I am not sure if I got it correct, but either way, I need some help futher understanding it if anyone has time to explain, please do.
Design L1 and L2 distance functions to assess the similarity of bank customers. Each customer is characterized by the following attributes:
− Age (customer’s age, which is a real number with the maximum age is 90 years and minimum age 15 years)
− Cr (“credit rating”) which is ordinal attribute with values ‘very good’, ‘good, ‘medium’, ‘poor’, and ‘very poor’.
− Av_bal (avg account balance, which is a real number with mean 7000, standard deviation is 4000)
Using the L1 distance function computes the distance between the following 2 customers: c1 = (55, good, 7000) and c2 = (25, poor, 1000). [15 points]
Using the L2 distance function computes the distance between the above mentioned 2 customers
Using the L2 distance function computes the distance between the above mentioned 2 customers.
Answer with L1
d(c1,c2) = (c1.cr-c2.cr)/4 +(c1.avg.bal –c2.avg.bal/4000)* (c1.age-mean.age/std.age)-( c2.age-mean.age/std.age)
The question as is, leaves some room for interpretation. Mainly because similarity is not specified exactly. I will try to explain what the standard approach would be.
Usually, before you start, you want to normalize values such that they are rougly in the same range. Otherwise, your similarity will be dominated by the feature with the largest variance.
If you have no information about the distribution but just the range of the values you want to try to nomalize them to [0,1]. For your example this means
norm_age = (age-15)/(90-15)
For nominal values you want to find a mapping to ordinal values if you want to use Lp-Norms. Note: this is not always possible (e.g., colors cannot intuitively be mapped to ordinal values). In you case you can transform the credit rating like this
cr = {0 if ‘very good’, 1 if ‘good, 2 if ‘medium’, 3 if ‘poor’, 4 if ‘very poor’}
afterwards you can do the same normalization as for age
norm_cr = cr/4
Lastly, for normally distributed values you usually perform standardization by subtracting the mean and dividing by the standard deviation.
norm_av_bal = (av_bal-7000)/4000
Now that you have normalized your values, you can go ahead and define the distance functions:
L1(c1, c2) = |c1.norm_age - c2.norm_age| + |c1.norm_cr - c2.norm_cr |
+ |c1.norm_av_bal - c2.norm_av_bal|
and
L2(c1, c2) = sqrt((c1.norm_age - c2.norm_age)2 + (c1.norm_cr -
c2.norm_cr)2 + (c1.norm_av_bal -
c2.norm_av_bal)2)

Algorithm to calculate sum of points for groups with varying member count [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Let's start with an example. In Harry Potter, Hogwarts has 4 houses with students sorted into each house. The same happens on my website and I don't know how many users are in each house. It could be 20 in one house 50 in another and 100 in the third and fourth.
Now, each student can earn points on the website and at the end of the year, the house with the most points will win.
But it's not fair to "only" do a sum of the points, as the house with a 100 students will have a much higher chance to win, as they have more users to earn points. So I need to come up with an algorithm which is fair.
You can see an example here: https://worldofpotter.dk/points
What I do now is to sum all the points for a house, and then divide it by the number of users who have earned more than 10 points. This is still not fair, though.
Any ideas on how to make this calculation more fair?
Things we need to take into account:
* The percent of users earning points in each house
* Few users earning LOTS of points
* Many users earning FEW points (It's not bad earning few points. It still counts towards the total points of the house)
Link to MySQL dump(with users, houses and points): https://worldofpotter.dk/wop_points_example.sql
Link to CSV of points only: https://worldofpotter.dk/points.csv
I'd use something like Discounted Cumulative Gain which is used for measuring the effectiveness of search engines.
The concept is as it follows:
FUNCTION evalHouseScore (0_INDEXED_SORTED_ARRAY scores):
score = 0;
FOR (int i = 0; i < scores.length; i++):
score += scores[i]/log2(i);
END_FOR
RETURN score;
END_FUNCTION;
This must be somehow modified as this way of measuring focuses on the first result. As this is subjective you should decide on your the way you would modify it. Below I'll post the code which some constants which you should try with different values:
FUNCTION evalHouseScore (0_INDEXED_SORTED_ARRAY scores):
score = 0;
FOR (int i = 0; i < scores.length; i++):
score += scores[i]/log2(i+K);
END_FOR
RETURN L*score;
END_FUNCTION
Consider changing the logarithm.
Tests:
int[] g = new int[] {758,294,266,166,157,132,129,116,111,88,83,74,62,60,60,52,43,40,28,26,25,24,18,18,17,15,15,15,14,14,12,10,9,5,5,4,4,4,4,3,3,3,2,1,1,1,1,1};
int[] s = new int[] {612,324,301,273,201,182,176,139,130,121,119,114,113,113,106,86,77,76,65,62,60,58,57,54,54,42,42,40,36,35,34,29,28,23,22,19,17,16,14,14,13,11,11,9,9,8,8,7,7,7,6,4,4,3,3,3,3,2,2,2,2,2,2,2,1,1,1};
int[] h = new int[] {813,676,430,382,360,323,265,235,192,170,107,103,80,70,60,57,43,41,21,17,15,15,12,10,9,9,9,8,8,6,6,6,4,4,4,3,2,2,2,1,1,1};
int[] r = new int[] {1398,1009,443,339,242,215,210,205,177,168,164,144,144,92,85,82,71,61,58,47,44,33,21,19,18,17,12,11,11,9,8,7,7,6,5,4,3,3,3,3,2,2,2,1,1,1,1};
The output is for different offsets:
1182
1543
1847
2286
904
1231
1421
1735
813
1120
1272
1557
It sounds like some sort of constraint between the houses may need to be introduced. I might suggest finding the person that earned the most points out of all the houses and using it as the denominator when rolling up the scores. This will guarantee the max value of a user's contribution is 1, then all the scores for a house can be summed and then divided by the number of users to normalize the house's score. That should give you a reasonable comparison. It does introduce issues with low numbers of users in a house that are high achievers in which you may want to consider lower limits to the number of house members. Another technique may be to introduce handicap scores for users to balance the scales. The algorithm will most likely flex over time based on the data you receive. To keep it fair it will take some responsive action after the initial iteration. Players can come up with some creative ways to make scoring systems work for them. Here is some pseudo-code in PHP that you may use:
<?php
$mostPointsEarned; // Find the user that earned the most points
$houseScores = [];
foreach ($houses as $house) {
$numberOfUsers = 0;
$normalizedScores = [];
foreach ($house->getUsers() as $user) {
$normalizedScores[] = $user->getPoints() / $mostPointsEarned;
$numberOfUsers++;
}
$houseScores[] = array_sum($normalizedScores) / $numberOfUsers;
}
var_dump($houseScores);
You haven't given any examples on what should be preferred state, and what are situations against which you want to be immune. (3,2,1,1 compared to 5,2 etc.)
It's also a pity you haven't provided us the dataset in some nice way to play.
scala> val input = Map( // as seen on 2016-09-09 14:10 UTC on https://worldofpotter.dk/points
'G' -> Seq(758,294,266,166,157,132,129,116,111,88,83,74,62,60,60,52,43,40,28,26,25,24,18,18,17,15,15,15,14,14,12,10,9,5,5,4,4,4,4,3,3,3,2,1,1,1,1,1),
'S' -> Seq(612,324,301,273,201,182,176,139,130,121,119,114,113,113,106,86,77,76,65,62,60,58,57,54,54,42,42,40,36,35,34,29,28,23,22,19,17,16,14,14,13,11,11,9,9,8,8,7,7,7,6,4,4,3,3,3,3,2,2,2,2,2,2,2,1,1,1),
'H' -> Seq(813,676,430,382,360,323,265,235,192,170,107,103,80,70,60,57,43,41,21,17,15,15,12,10,9,9,9,8,8,6,6,6,4,4,4,3,2,2,2,1,1,1),
'R' -> Seq(1398,1009,443,339,242,215,210,205,177,168,164,144,144,92,85,82,71,61,58,47,44,33,21,19,18,17,12,11,11,9,8,7,7,6,5,4,3,3,3,3,2,2,2,1,1,1,1)
) // and the results on the website were: 1. R 1951, 2. H 1859, 3. S 990, 4. G 954
Here is what I thought of:
def singleValuedScore(individualScores: Seq[Int]) = individualScores
.sortBy(-_) // sort from most to least
.zipWithIndex // add indices e.g. (best, 0), (2nd best, 1), ...
.map { case (score, index) => score * (1 + index) } // here is the 'logic'
.max
input.mapValues(singleValuedScore)
res: scala.collection.immutable.Map[Char,Int] =
Map(G -> 1044,
S -> 1590,
H -> 1968,
R -> 2018)
The overall positions would be:
Ravenclaw with 2018 aggregated points
Hufflepuff with 1968
Slytherin with 1590
Gryffindor with 1044
Which corresponds to the ordering on that web: 1. R 1951, 2. H 1859, 3. S 990, 4. G 954.
The algorithms output is maximal product of score of user and rank of the user within a house.
This measure is not affected by "long-tail" of users having low score compared to the active ones.
There are no hand-set cutoffs or thresholds.
You could experiment with the rank attribution (score * index or score * Math.sqrt(index) or score / Math.log(index + 1) ...)
I take it that the fair measure is the number of points divided by the number of house members. Since you have the number of points, the exercise boils down to estimate the number of members.
We are in short supply of data here as the only hint we have on member counts is the answers on the website. This makes us vulnerable to manipulation, members can trick us into underestimating their numbers. If the suggested estimation method to "count respondents with points >10" would be known, houses would only encourage the best to do the test to hide members from our count. This is a real problem and the only thing I will do about it is to present a "manipulation indicator".
How could we then estimate member counts? Since we do not know anything other than test results, we have to infer the propensity to do the test from the actual results. And we have little other to assume than that we would have a symmetric result distribution (of the logarithm of the points) if all members tested. Now let's say the strong would-be respondents are more likely to actually test than weak would-be respondents. Then we could measure the extra dropout ratio for the weak by comparing the numbers of respondents in corresponding weak and strong test-point quantiles.
To be specific, of the 205 answers, there are 27 in the worst half of the overall weakest quartile, while 32 in the strongest half of the best quartile. So an extra 5 respondents of the very weakest have dropped out from an assumed all-testing symmetric population, and to adjust for this, we are going to estimate member count from this quantile by multiplying the number of responses in it by 32/27=about 1.2. Similarly, we have 29/26 for the next less-extreme half quartiles and 41/50 for the two mid quartiles.
So we would estimate members by simply counting the number of respondents but multiplying the number of respondents in the weak quartiles mentioned above by 1.2, 1.1 and 0.8 respectively. If however any result distribution within a house would be conspicuously skewed, which is not the case now, we would have to suspect manipulation and re-design our member count.
For the sample at hand however, these adjustments to member counts are minor, and yields the same house ranks as from just counting the respondents without adjustments.
I got myself to amuse me a little bit with your question and some python programming with some random generated data. As some people mentioned in the comments you need to define what is fairness. If as you said you don't know the number of people in each of the houses, you can use the number of participations of each house, thus you motivate participation (it can be unfair depending on the number of people of each house, but as you said you don't have this data on the first place).
The important part of the code is the following.
import numpy as np
from numpy.random import randint # import random int
# initialize random seed
np.random.seed(4)
houses = ["Gryffindor","Slytherin", "Hufflepuff", "Ravenclaw"]
houses_points = []
# generate random data for each house
for _ in houses:
# houses_points.append(randint(0, 100, randint(60,100)))
houses_points.append(randint(0, 50, randint(2,10)))
# count participation
houses_participations = []
houses_total_points = []
for house_id in xrange(len(houses)):
houses_total_points.append(np.sum(houses_points[house_id]))
houses_participations.append(len(houses_points[house_id]))
# sum the total number of participations
total_participations = np.sum(houses_participations)
# proposed model with weighted total participation points
houses_partic_points = []
for house_id in xrange(len(houses)):
tmp = houses_total_points[house_id]*houses_participations[house_id]/total_participations
houses_partic_points.append(tmp)
The results of this method are the following:
House Points per Participant
Gryffindor: [46 5 1 40]
Slytherin: [ 8 9 39 45 30 40 36 44 38]
Hufflepuff: [42 3 0 21 21 9 38 38]
Ravenclaw: [ 2 46]
House Number of Participations per House
Gryffindor: 4
Slytherin: 9
Hufflepuff: 8
Ravenclaw: 2
House Total Points
Gryffindor: 92
Slytherin: 289
Hufflepuff: 172
Ravenclaw: 48
House Points weighted by a participation factor
Gryffindor: 16
Slytherin: 113
Hufflepuff: 59
Ravenclaw: 4
You'll find the complete file with printing results here (https://gist.github.com/silgon/5be78b1ea0b55a20d90d9ec3e7c515e5).
You should enter some more rules to define the fairness.
Idea 1
You could set up the rule that anyone has to earn at least 10 points to enter the competition.
Then you can calculate the average points for each house.
Positive: Everyone needs to show some motivation.
Idea 2
Another approach would be to set the rule that from each house only the 10 best students will count for the competition.
Positive: Easy rule to calculate the points.
Negative: Students might become uninterested if they see they can't reach the top 10 places of their house.
From my point of view, your problem is diveded in a few points:
The best thing to do would be to re - assignate the player in the different Houses so that each House has the same number of players. (as explain by #navid-vafaei)
If you don't want to do that because you believe that it may affect your game popularity with player whom are in House that they don't want because you can change the choice of the Sorting Hat at least in the movie or books.
In that case, you can sum the point of the student's house and divide by the number of students. You may just remove the number of student with a very low score. You may remove as well the student with a very low activity because students whom skip school might be fired.
The most important part for me n your algorithm is weather or not you give points for all valuables things:
In the Harry Potter's story, the students earn point on the differents subjects they chose at school and get point according to their score.
At the end of the year, there is a special award event. At that moment, the Director gave points for valuable things which cannot be evaluated in the subject at school suche as the qualites (bravery for example).

How to create a scoring system using two variables

I have an application (Node/Angular) that I'm creating where I'm trying to rank users based on overall performance across two metrics. There are two metrics used to track the users we are using are the following:
Units Produced (ranges between 0 - 6000)
Rate of production = [ Units Produced ] / [ Labor Hours ] (ranges between 0 - 100)
However, ranking users explicitly by either of these variables doesn't make sense, because it creates some strange incentives/behaviors.
For instance, it is possible to have a really high Rate of Production, but a super low number of total number of units produced by working really hard over a short period of time. Alternatively, you can have a very high number of Units Produced, but it may be due to the fact that they worked overtime, and thus were able to produce more units than anyone else just due to the fact that they had longer to work, and they could have a low Rate of Production.
Does anyone have experience designing these types of scoring systems? How have you handled it?
First, I would recommend to bring them on the same scale. E.g. divide Units produced by 60.
Then, if you are fine with equal weights, there are three common simple choices:
Add the scores
Multiply the scores (equal to adding logs of each)
Take the minimum of the two scores
Which of the ones is best, depends on to what extent you want it to be a measure of combined good results. In your case, I would recommend you to multiply and put a scale on the resulting product.
If you want to go a little more complex and weigh or play around with how much to reward separate vs joint scores, you can use the following formula:
V = alpha * log_b[Units Produced / 60] + (1-alpha) * log_b[Rate of Production],
where alpha determines the weighting of one vs the other and the base of the logarithmic function determines to what extent a joint success is rewarded.
I did something very similar I found it valuable to break them into leagues or tiers, for example using Units Produced as a base.
Novice = 100 Units Produced
Beginner = 500 Units Produced
Advanced = 2000 Units Produced
Expert = 4000 Units Produced
Putting this into a useable object
var levels = [
{id: 1, name: "Novice", minUnits: 100, maxUnits: 599 },
{id: 2, name: "Beginner", minUnits: 500, maxUnits: 1999 },
{id: 3, name: "Advanced", minUnits: 2000, maxUnits: 3999 },
{id: 4, name: "Expert", minUnits: 4000, maxUnits: 6000 }
]
You can then use your Rate of production to multiply by a weighted value inside the levels, you can determine what this is. You can play with the values to make it as hard or as easy as you want.
You can do a combination with
SCORE = 200/( K_1/x_1 + K_2/x_2 )
// x_1 : Score 1
// x_2 : Score 2
// K_1 : Maximum of Score 1
// K_2 : Maximum of Score 2
Of course be carefull when dividing by zero. If either x_1 or x_2 are zero then SCORE=0. If x_1=K_1 and x_2=K_2 then SCORE=100 (maximum)
Otherwise the score is somewhere in between. If x_1/K_1 = x_2/K_2 = z then SCORE = 100*z
This weighs the lower score more such that you get rewarded when raising one of the two scores (unlike a minimum of the two scenarios) but not as much as raising both.

Non-linear comparison sorting / scoring

I have an array I want to sort based on assigning a score to each element in the array.
Let's say the possible score range is 0-100. And to get that score we are going to use 2 comparison data points, one with a weighting of 75 and one with a weighting of 25. Let's call them valueA and valueB. And we will transpose each value into a score. So:
valueA (range = 0-10,000)
valueB (range = 0-70)
scoreA (range = 0 - 75)
scoreB (range = 0 - 25)
scoreTotal = scoreA + scoreB (0 - 100)
Now the question is how to transpose valueA to scoreA in a non-linear way with heavier weighting for being close to the min value. What I mean by that is that for valueA, 0 would be a perfect score (75), but a value of say 20 would give a mid-point score of 37.5 and a value of say 100 would give a very low score of say 5, and then everything greater would trend towards 0 (e.g. a value of 5,000 would be essentially 0). Ideally I could setup a curve with a few data points (say 4 quartile points) and then the algorithm would fit to that curve. Or maybe the simplest solution is to create a bunch of points on the curve (say 10) and do a linear transposition between each of those 10 points? But I'm hoping there is a much simpler algorithm to accomplish this without figuring out all the points on the curve myself and then having to tweak 10+ variables. I'd rather 1 or 2 inputs to define how steep the curve is. Possible?
I don't need something super complex or accurate, just a simple algorithm so there is greater weighting for being close to the min of the range, and way less weighting for being close to the max of the range. Hopefully this makes sense.
My stats math is so rusty I'm not even sure what this is called for searching for a solution. All those years of calculus and statistics for naught.
I'm implementing this in Objective C, but any c-ish/java-ish pseudo code would be fine.
A function you may want to try is
max / [(log(x+2)/log(2))^N]
where max is either 75 or 25 in your case. The log(x+2)/log(2) part ensures that f(0) == max (you can substitute log(x+C)/log(C) here for any C > 0; a higher C will slow the curve's descent); the ^N determines how quickly your function drops to 0 (you can play around with the function here to get a picture of what's going on)

Algorithm to distribute points between weighted items?

I have 100 points to dole out to a set of items. Each item must receive a proportional number of points relative to others (a weight). Some items may have a weight of 0, but they must receive some points.
I tried to do it by giving each item 5 points, then doling out the remaining points proportionally, but when I have 21 items, the algorithm assigns 0 to one item. Of course, I could hand out 1 point initially, but then the problem remains for 101 items or more. Normally, this algorithm should deal with less than 20 items, but I want the algorithm to be robust in the face of more items.
I know using floats/fractions would be perfect, but the underlying system must receive integers, and the total must be 100.
This is framework / language agnostic, although I will implement in Ruby.
Currently, I have this (pseudo-code):
total_weight = sum_of(items.weight)
if total_weight == 0 then
# Distribute all points equally between each item
items.points = 100 / number_of_items
# Apply remaining points in case of rounding errors (100 / 3 == [33, 33, 34])
else
items.points = 5
points_to_dole_out = 100 - number_of_items * 5
for(item in items)
item.points += item.weight * total_weight / 100
end
end
First, give every item one point. This is to meet your requirement that all items get points. Then get the % of the total weight that each item has, and award it that % of the remaining points (round down).
There will be some portion of points left over. Sort the set of items by the size of their decimal parts, and then dole out the remaining points one at a time in order from biggest decimal part to smallest.
So if an item has a weight of twelve and the total weight of all items is 115, you would first award it 1 point. If there were 4 other items, there would be 110 points left after doling out the minimum scores. You would then award the item 10 points because its percentage of the total weight is 9.58 and 10.538 9.58% of 110. Then you would sort it based on the .538 and if it were near the top end, it might end up getting bumped to a 11.
The 101 case cannot be solved given the two constraints {total == 100 } { each item > 0 } So in order to be robust you must get a solution to that. That's a business problem not a technical one.
The minimum score case is actually a business problem too. Clearly the meaning of your results if you dole out a min of 5 per item is quite different from a min score of 1 - the gap between the low, non-zero score and the low scores is potentially compressed. Hence you really should get clarity from the users of the system rather than just pick a number: how will they use this data?

Resources