Calculate statistics on numbers entered by user - algorithm

This was my tutorial given by the lecturer. I don't understand the question. I need guidance into the right direction.
Write an algorithm to read in a list of basketball scores (non-negative integers) one at a time from the user and output the following statistics:
Total number of games.
Total number of games scoring at least 90 points.
Percentage of games scoring at least 90 points.
The user entering a negative sentinel value indicates the end of the input. Note that the sentinel value is not used in computing the highest, lowest or average game score.
Requirements:
Write pseudo code for how you would solve each statistic
Example: total number of games
For each input score, increment
games by one
Determine the variables you will need and figure out the type of each variable
Define and initialize each variable
Determine what type of loop you are going to write
Start with statistic number one (total number of games) and get your loop to compute the total number of games. When you end your loop,
output the total number of games, and then move to problem two.
You only need to write one loop.
Write a complete algorithm for the above problem.
I've tried to understand the requirement and tried googling for some alternative language but unable to find so
n = 0 // number of games
o = 0 // total number of games scoring at least 90 points
for( o = 0; o <= 90; o++ )
{
input =get user input for score
n++
o += input
}
percentage = n/o *100
output percentage
Have I correctly understood the question criteria?
EDIT Answer Attempt 1 :-
int numGames = 0; //number of games
int numTotalPoints = 0; //total number of games scoring
int userInput =0; //to Track input if negative number is enterred
double average = 0.0 //to get average of the game
double gameTo90Points =0.0; //calculate total games to reach 90 points
double percentage 0.0; //to calculate the percentage
Text.put("Input the game score");
userInput = text.getInt;
while(userInput >= 0 )
{
numTotalPoints += userInput;
numGames++;
Text.put("Input the game score");
userInput = text.getInt;
}
if(numGames = 0)
{
Text.put("Not enough score to tabulate");
}
else
{
average = ((double)numTotalPoints)/numGames);
gameTo90Points = 90/average;
percentage = (gameTo90Points/90)*100
Text.put("Total number of games :" +numGames);
Text.put("Total number of games scoring at least 90 points:" +gameTo90Points);
Text.put("Percentage of games scoring at least 90 points:" +percentage);
}

As this is a task you must complete, we should not provide you with the answer to that assignment.
I will provide some comments on your current pseudo-code.
n = 0 // number of games
o = 0 // total number of games scoring at least 90 points
So far this is a good start, but it is better to use variable names that actually tell something about it (e.g. numGames, numHighScoringGames would be good candidates). Also, the assignment asks to "figure out the type of each variable". This is something you have not done yet...
for( o = 0; o <= 90; o++ )
This loop is wrong. After the loop finishes o will be a number greater than 90. But o is supposed to be a particular number of games (with a score of at least 90). This should trigger an alarm... You haven't read any input yet and you already seem to know there will be more than 90 of such games? That's not right.
The value of o should have nothing to do with whether the loop should continue or not.
input =get user input for score
Again, the data type should be determined for the variable input.
n++
This is good, but you did not take into account this part of the assignment:
The user entering a negative sentinel value indicates the end of the input.
Your code should verify if the user entered a negative sentinel value. And if so, you should not ask for more input.
o += input
The variable o is supposed to be a number of games, but now you are adding a score to it... that cannot be right. Also, you add it unconditionally... Should you not first check whether that game is "scoring at least 90 points"?
percentage = n/o *100
Here you use o as it was intended (as a number of games). But think about this... which one of the two will be greater (when not equal)? n or o? Taking that answer into account: Is your formula correct?
Secondly, could the denominator be zero? Should you protect the code from it?
output percentage
OK, but don't forget that the assignment asks for three statistics, not just one.

Related

Converting Scratch to Algorithm

First time I am learning algorithms and trying to figure out with stratch. I am following tutorials on Stratch wiki. How can I convert this to algorithm?( with flow chart or normal steps). Especially the loop.( I uploaded as picture) Please click here to see picture
I Started:
Step:1 Start
Step2: İnt: delete all of numbers, iterator, amount,sum
Step3: How many numbers you want?
Step4:initialize sum=0,amount=0,iterator=1
Step5: Enter the elements values
Step6: found the sum by using loop in array and update sum value in which loop must be continue till (no of elements-1 ) times
Step7:avg=sum/no of elements
Step8: Print the values average
I don't think It's true. I mean I feel there are errors? Thank you for time.
Scratch
Here is the algorithm in variant 2 (see Java algorithm below) in Scratch. The output should be identical.
Java
Here is the algorithm in Java where I did comment the steps which should give you a step-by-step guide on how to do it in Scratch as well.
I have also implemented two variants of the algorithm to show you some considerations that a programmer often has to think of when implementing an algorithm which mainly is time (= time required for the algorithm to complete) and space (= memory used on your computer).
Please note: the following algorithms do not handle errors. E.g. if a user would enter a instead of a number the program would crash. It is easy to adjust the program to handle this but for simplicity I did not do that.
Variant 1: Storing all elements in array numbers
This variant stores all numbers in an array numbers and calculates the sum at the end using those numbers which is slower than variant 2 as the algorithm goes over all the numbers twice. The upside is that you will preserve all the numbers the user entered and you could use that later on if you need to but you will need storage to store those values.
public static void yourAlgorithm() {
// needed in Java to get input from user
var sc = new Scanner(System.in);
// print to screen (equivalent to "say"/ "ask")
System.out.print("How many numbers do you want? ");
// get amount of numbers as answer from user
var amount = sc.nextInt();
// create array to store all elements
var numbers = new int[amount];
// set iterator to 1
int iterator = 1;
// as long as the iterator is smaller or equal to the number of required numbers, keep asking for new numbers
// equivalent to "repeat amount" except that retries are possible if no number was entered
while (iterator <= amount) {
// ask for a number
System.out.printf("%d. number: ", iterator);
// insert the number at position iterator - 1 in the array
numbers[iterator - 1] = sc.nextInt();
// increase iterator by one
iterator++;
}
// calulate the sum after all the numbers have been entered by the user
int sum = 0;
// go over all numbers again! (this is why it is slower) and calculate the sum
for (int i = 0; i < amount; i++) {
sum += numbers[i];
}
// print average to screen
System.out.printf("Average: %s / %s = %s", sum, amount, (double)sum / (double)amount);
}
Variant 2: Calculating sum when entering new number
This algorithm does not store the numbers the user enters but immediately uses the input to calculate the sum, hence it is faster as only one loop is required and it needs less memory as the numbers do not need to be stored.
This would be the best solution (fastest, least space/ memory needed) in case you do not need all the numbers the user entered later on.
// needed in Java to get input from user
var sc = new Scanner(System.in);
// print to screen (equivalent to "say"/ "ask")
System.out.print("How many numbers do you want? ");
// get amount of numbers as answer from user
var amount = sc.nextInt();
// set iterator to 1
int iterator = 1;
int sum = 0;
// as long as the iterator is smaller or equal to the number of required numbers, keep asking for new numbers
// equivalent to "repeat amount" except that retries are possible if no number was entered (e.g. character was entered instead)
while (iterator <= amount) {
// ask for a number
System.out.printf("%d. number: ", iterator);
// get number from user
var newNumber = sc.nextInt();
// add the new number to the sum
sum += newNumber;
// increase iterator by one
iterator++;
}
// print average to screen
System.out.printf("Average: %s / %s = %s", sum, amount, (double)sum / (double)amount);
Variant 3: Combining both approaches
You could also combine both approaches, i. e. calculating the sum within the first loop and additionally storing the values in a numbers array so you could use that later on if you need to.
Expected output

How to get the number e (2.718) using a random number sensor?

Is it possible to calculate the number e (2.718) using random numbers?
I'm assuming that when you say "using random numbers" you mean "using some sort of random sampling scheme." If you want the exact answer to an infinite number of decimals, the answer is "no, not unless you have an infinite amount of time." However, we can generate random sequences whose expected value is e, and we can assess the sampling error using basic statistics. By increasing the sample size, we can decrease the sampling error to any precision you want as long as you specify your desired confidence level.
It turns out that if you sum a bunch of random uniform(0,1)'s until the sum exceeds 1, the quantity of uniforms required has an expected value of e. We can turn that into a sampling problem by writing a method/function to return the count, and taking the average of the values obtained by calling that method multiple times.
You didn't specify any particular language, so here it is in Ruby (which is practically like pseudocode):
require 'quickstats' # install from rubygems w/ 'gem install quickstats'
def trial # generate results of one trial
count = 0
sum = 0.0
while sum < 1.0
count += 1
sum += rand # Ruby's rand produces U(0,1) values by default
end
return count # added "return" keyword for non-rubyists' readability
end
stats = QuickStats.new
10_000_000.times { stats.new_obs trial } # more precision? bump up sample size
puts "Average = #{stats.avg}"
half_width = 1.96 * stats.std_err
puts "CI half-width = #{half_width}"
deviation = (stats.avg - Math::E).abs
puts " |E - avg| = #{deviation} (should be ≤ half-width 95% of the time)"
This runs in under 4 seconds on my laptop and produces outputs such as:
Average = 2.7179918000002234
CI half-width = 0.0005421324752620413
|E - avg| = 0.0002900284588216451 (should be ≤ half-width 95% of the time)
Here’s another option. Consider the following probability question: you have a biased coin that comes up heads with probability 1/n. You then flip the coin n times. What is the probability that you never flip heads? Well, that’s the probability that you flip tails n times, which is (1 - 1/n)n, which as n tends towards infinity starts to rapidly approach 1/e. You could therefore estimate e by picking some modest value of n, simulating n tosses of a coin that comes up heads with probability 1/n, and seeing whether you never flip heads. The proportion of trials that don’t yield heads will approach 1/e, and from there you can estimate e.
For example, here's Python code to flip a coin with heads probability 1/n a total of n times (done by sampling a uniformly random number between 0 and 1) and see if all of them are tails:
from random import random
def one_trial(n):
for i in range(n):
if random() < 1 / n:
return False
return True
We can then run a large number of trials and see which fraction of them are all tails. That fraction will be approximately 1/e, so we just take the reciprocal:
def estimate_e(n, num_trials):
successes = 0
for i in range(num_trials):
if one_trial(n):
successes += 1
return num_trials / successes
Doing this with n = 210 and num_trials = 220 gave me the estimate
e ≈ 2.7198016257969466,
which isn't too bad.

How to compute blot exposure in backgammon efficiently

I am trying to implement an algorithm for backgammon similar to td-gammon as described here.
As described in the paper, the initial version of td-gammon used only the raw board encoding in the feature space which created a good playing agent, but to get a world-class agent you need to add some pre-computed features associated with good play. One of the most important features turns out to be the blot exposure.
Blot exposure is defined here as:
For a given blot, the number of rolls out of 36 which would allow the opponent to hit the blot. The total blot exposure is the number of rolls out of 36 which would allow the opponent to hit any blot. Blot exposure depends on: (a) the locations of all enemy men in front of the blot; (b) the number and location of blocking points between the blot and the enemy men and (c) the number of enemy men on the bar, and the rolls which allow them to re-enter the board, since men on the bar must re-enter before blots can be hit.
I have tried various approaches to compute this feature efficiently but my computation is still too slow and I am not sure how to speed it up.
Keep in mind that the td-gammon approach evaluates every possible board position for a given dice roll, so each turn for every players dice roll you would need to calculate this feature for every possible board position.
Some rough numbers: assuming there are approximately 30 board position per turn and an average game lasts 50 turns we get that to run 1,000,000 game simulations takes: (x * 30 * 50 * 1,000,000) / (1000 * 60 * 60 * 24) days where x is the number of milliseconds to compute the feature. Putting x = 0.7 we get approximately 12 days to simulate 1,000,000 games.
I don't really know if that's reasonable timing but I feel there must be a significantly faster approach.
So here's what I've tried:
Approach 1 (By dice roll)
For every one of the 21 possible dice rolls, recursively check to see a hit occurs. Here's the main workhorse for this procedure:
private bool HitBlot(int[] dieValues, Checker.Color checkerColor, ref int depth)
{
Moves legalMovesOfDie = new Moves();
if (depth < dieValues.Length)
{
legalMovesOfDie = LegalMovesOfDie(dieValues[depth], checkerColor);
}
if (depth == dieValues.Length || legalMovesOfDie.Count == 0)
{
return false;
}
bool hitBlot = false;
foreach (Move m in legalMovesOfDie.List)
{
if (m.HitChecker == true)
{
return true;
}
board.ApplyMove(m);
depth++;
hitBlot = HitBlot(dieValues, checkerColor, ref depth);
board.UnapplyMove(m);
depth--;
if (hitBlot == true)
{
break;
}
}
return hitBlot;
}
What this function does is take as input an array of dice values (i.e. if the player rolls 1,1 the array would be [1,1,1,1]. The function then recursively checks to see if there is a hit and if so exits with true. The function LegalMovesOfDie computes the legal moves for that particular die value.
Approach 2 (By blot)
With this approach I first find all the blots and then for each blot I loop though every possible dice value and see if a hit occurs. The function is optimized so that once a dice value registers a hit I don't use it again for the next blot. It is also optimized to only consider moves that are in front of the blot. My code:
public int BlotExposure2(Checker.Color checkerColor)
{
if (DegreeOfContact() == 0 || CountBlots(checkerColor) == 0)
{
return 0;
}
List<Dice> unusedDice = Dice.GetAllDice();
List<int> blotPositions = BlotPositions(checkerColor);
int count = 0;
for(int i =0;i<blotPositions.Count;i++)
{
int blotPosition = blotPositions[i];
for (int j =unusedDice.Count-1; j>= 0;j--)
{
Dice dice = unusedDice[j];
Transitions transitions = new Transitions(this, dice);
bool hitBlot = transitions.HitBlot2(checkerColor, blotPosition);
if(hitBlot==true)
{
unusedDice.Remove(dice);
if (dice.ValuesEqual())
{
count = count + 1;
}
else
{
count = count + 2;
}
}
}
}
return count;
}
The method transitions.HitBlot2 takes a blotPosition parameter which ensures that only moves considered are those that are in front of the blot.
Both of these implementations were very slow and when I used a profiler I discovered that the recursion was the cause, so I then tried refactoring these as follows:
To use for loops instead of recursion (ugly code but it's much faster)
To use parallel.foreach so that instead of checking 1 dice value at a time I check these in parallel.
Here are the average timing results of my runs for 50000 computations of the feature (note the timings for each approach was done of the same data):
Approach 1 using recursion: 2.28 ms per computation
Approach 2 using recursion: 1.1 ms per computation
Approach 1 using for loops: 1.02 ms per computation
Approach 2 using for loops: 0.57 ms per computation
Approach 1 using parallel.foreach: 0.75 ms per computation
6 Approach 2 using parallel.foreach: 0.75 ms per computation
I've found the timings to be quite volatile (Maybe dependent on the random initialization of the neural network weights) but around 0.7 ms seems achievable which if you recall leads to 12 days of training for 1,000,000 games.
My questions are: Does anyone know if this is reasonable? Is there a faster algorithm I am not aware of that can reduce training?
One last piece of info: I'm running on a fairly new machine. Intel Cote (TM) i7-5500U CPU #2.40 GHz.
Any more info required please let me know and I will provide.
Thanks,
Ofir
Yes, calculating these features makes really hairy code. Look at the GNU Backgammon code. find the eval.c and look at the lines for 1008 to 1267. Yes, it's 260 lines of code. That code calculates what the number of rolls that hits at least one checker, and also the number of rolls that hits at least 2 checkers. As you see, the code is hairy.
If you find a better way to calculate this, please post your results. To improve I think you have to look at the board representation. Can you represent the board in a different way that makes this calculation faster?

Algorithm to calculate sum of points for groups with varying member count [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Let's start with an example. In Harry Potter, Hogwarts has 4 houses with students sorted into each house. The same happens on my website and I don't know how many users are in each house. It could be 20 in one house 50 in another and 100 in the third and fourth.
Now, each student can earn points on the website and at the end of the year, the house with the most points will win.
But it's not fair to "only" do a sum of the points, as the house with a 100 students will have a much higher chance to win, as they have more users to earn points. So I need to come up with an algorithm which is fair.
You can see an example here: https://worldofpotter.dk/points
What I do now is to sum all the points for a house, and then divide it by the number of users who have earned more than 10 points. This is still not fair, though.
Any ideas on how to make this calculation more fair?
Things we need to take into account:
* The percent of users earning points in each house
* Few users earning LOTS of points
* Many users earning FEW points (It's not bad earning few points. It still counts towards the total points of the house)
Link to MySQL dump(with users, houses and points): https://worldofpotter.dk/wop_points_example.sql
Link to CSV of points only: https://worldofpotter.dk/points.csv
I'd use something like Discounted Cumulative Gain which is used for measuring the effectiveness of search engines.
The concept is as it follows:
FUNCTION evalHouseScore (0_INDEXED_SORTED_ARRAY scores):
score = 0;
FOR (int i = 0; i < scores.length; i++):
score += scores[i]/log2(i);
END_FOR
RETURN score;
END_FUNCTION;
This must be somehow modified as this way of measuring focuses on the first result. As this is subjective you should decide on your the way you would modify it. Below I'll post the code which some constants which you should try with different values:
FUNCTION evalHouseScore (0_INDEXED_SORTED_ARRAY scores):
score = 0;
FOR (int i = 0; i < scores.length; i++):
score += scores[i]/log2(i+K);
END_FOR
RETURN L*score;
END_FUNCTION
Consider changing the logarithm.
Tests:
int[] g = new int[] {758,294,266,166,157,132,129,116,111,88,83,74,62,60,60,52,43,40,28,26,25,24,18,18,17,15,15,15,14,14,12,10,9,5,5,4,4,4,4,3,3,3,2,1,1,1,1,1};
int[] s = new int[] {612,324,301,273,201,182,176,139,130,121,119,114,113,113,106,86,77,76,65,62,60,58,57,54,54,42,42,40,36,35,34,29,28,23,22,19,17,16,14,14,13,11,11,9,9,8,8,7,7,7,6,4,4,3,3,3,3,2,2,2,2,2,2,2,1,1,1};
int[] h = new int[] {813,676,430,382,360,323,265,235,192,170,107,103,80,70,60,57,43,41,21,17,15,15,12,10,9,9,9,8,8,6,6,6,4,4,4,3,2,2,2,1,1,1};
int[] r = new int[] {1398,1009,443,339,242,215,210,205,177,168,164,144,144,92,85,82,71,61,58,47,44,33,21,19,18,17,12,11,11,9,8,7,7,6,5,4,3,3,3,3,2,2,2,1,1,1,1};
The output is for different offsets:
1182
1543
1847
2286
904
1231
1421
1735
813
1120
1272
1557
It sounds like some sort of constraint between the houses may need to be introduced. I might suggest finding the person that earned the most points out of all the houses and using it as the denominator when rolling up the scores. This will guarantee the max value of a user's contribution is 1, then all the scores for a house can be summed and then divided by the number of users to normalize the house's score. That should give you a reasonable comparison. It does introduce issues with low numbers of users in a house that are high achievers in which you may want to consider lower limits to the number of house members. Another technique may be to introduce handicap scores for users to balance the scales. The algorithm will most likely flex over time based on the data you receive. To keep it fair it will take some responsive action after the initial iteration. Players can come up with some creative ways to make scoring systems work for them. Here is some pseudo-code in PHP that you may use:
<?php
$mostPointsEarned; // Find the user that earned the most points
$houseScores = [];
foreach ($houses as $house) {
$numberOfUsers = 0;
$normalizedScores = [];
foreach ($house->getUsers() as $user) {
$normalizedScores[] = $user->getPoints() / $mostPointsEarned;
$numberOfUsers++;
}
$houseScores[] = array_sum($normalizedScores) / $numberOfUsers;
}
var_dump($houseScores);
You haven't given any examples on what should be preferred state, and what are situations against which you want to be immune. (3,2,1,1 compared to 5,2 etc.)
It's also a pity you haven't provided us the dataset in some nice way to play.
scala> val input = Map( // as seen on 2016-09-09 14:10 UTC on https://worldofpotter.dk/points
'G' -> Seq(758,294,266,166,157,132,129,116,111,88,83,74,62,60,60,52,43,40,28,26,25,24,18,18,17,15,15,15,14,14,12,10,9,5,5,4,4,4,4,3,3,3,2,1,1,1,1,1),
'S' -> Seq(612,324,301,273,201,182,176,139,130,121,119,114,113,113,106,86,77,76,65,62,60,58,57,54,54,42,42,40,36,35,34,29,28,23,22,19,17,16,14,14,13,11,11,9,9,8,8,7,7,7,6,4,4,3,3,3,3,2,2,2,2,2,2,2,1,1,1),
'H' -> Seq(813,676,430,382,360,323,265,235,192,170,107,103,80,70,60,57,43,41,21,17,15,15,12,10,9,9,9,8,8,6,6,6,4,4,4,3,2,2,2,1,1,1),
'R' -> Seq(1398,1009,443,339,242,215,210,205,177,168,164,144,144,92,85,82,71,61,58,47,44,33,21,19,18,17,12,11,11,9,8,7,7,6,5,4,3,3,3,3,2,2,2,1,1,1,1)
) // and the results on the website were: 1. R 1951, 2. H 1859, 3. S 990, 4. G 954
Here is what I thought of:
def singleValuedScore(individualScores: Seq[Int]) = individualScores
.sortBy(-_) // sort from most to least
.zipWithIndex // add indices e.g. (best, 0), (2nd best, 1), ...
.map { case (score, index) => score * (1 + index) } // here is the 'logic'
.max
input.mapValues(singleValuedScore)
res: scala.collection.immutable.Map[Char,Int] =
Map(G -> 1044,
S -> 1590,
H -> 1968,
R -> 2018)
The overall positions would be:
Ravenclaw with 2018 aggregated points
Hufflepuff with 1968
Slytherin with 1590
Gryffindor with 1044
Which corresponds to the ordering on that web: 1. R 1951, 2. H 1859, 3. S 990, 4. G 954.
The algorithms output is maximal product of score of user and rank of the user within a house.
This measure is not affected by "long-tail" of users having low score compared to the active ones.
There are no hand-set cutoffs or thresholds.
You could experiment with the rank attribution (score * index or score * Math.sqrt(index) or score / Math.log(index + 1) ...)
I take it that the fair measure is the number of points divided by the number of house members. Since you have the number of points, the exercise boils down to estimate the number of members.
We are in short supply of data here as the only hint we have on member counts is the answers on the website. This makes us vulnerable to manipulation, members can trick us into underestimating their numbers. If the suggested estimation method to "count respondents with points >10" would be known, houses would only encourage the best to do the test to hide members from our count. This is a real problem and the only thing I will do about it is to present a "manipulation indicator".
How could we then estimate member counts? Since we do not know anything other than test results, we have to infer the propensity to do the test from the actual results. And we have little other to assume than that we would have a symmetric result distribution (of the logarithm of the points) if all members tested. Now let's say the strong would-be respondents are more likely to actually test than weak would-be respondents. Then we could measure the extra dropout ratio for the weak by comparing the numbers of respondents in corresponding weak and strong test-point quantiles.
To be specific, of the 205 answers, there are 27 in the worst half of the overall weakest quartile, while 32 in the strongest half of the best quartile. So an extra 5 respondents of the very weakest have dropped out from an assumed all-testing symmetric population, and to adjust for this, we are going to estimate member count from this quantile by multiplying the number of responses in it by 32/27=about 1.2. Similarly, we have 29/26 for the next less-extreme half quartiles and 41/50 for the two mid quartiles.
So we would estimate members by simply counting the number of respondents but multiplying the number of respondents in the weak quartiles mentioned above by 1.2, 1.1 and 0.8 respectively. If however any result distribution within a house would be conspicuously skewed, which is not the case now, we would have to suspect manipulation and re-design our member count.
For the sample at hand however, these adjustments to member counts are minor, and yields the same house ranks as from just counting the respondents without adjustments.
I got myself to amuse me a little bit with your question and some python programming with some random generated data. As some people mentioned in the comments you need to define what is fairness. If as you said you don't know the number of people in each of the houses, you can use the number of participations of each house, thus you motivate participation (it can be unfair depending on the number of people of each house, but as you said you don't have this data on the first place).
The important part of the code is the following.
import numpy as np
from numpy.random import randint # import random int
# initialize random seed
np.random.seed(4)
houses = ["Gryffindor","Slytherin", "Hufflepuff", "Ravenclaw"]
houses_points = []
# generate random data for each house
for _ in houses:
# houses_points.append(randint(0, 100, randint(60,100)))
houses_points.append(randint(0, 50, randint(2,10)))
# count participation
houses_participations = []
houses_total_points = []
for house_id in xrange(len(houses)):
houses_total_points.append(np.sum(houses_points[house_id]))
houses_participations.append(len(houses_points[house_id]))
# sum the total number of participations
total_participations = np.sum(houses_participations)
# proposed model with weighted total participation points
houses_partic_points = []
for house_id in xrange(len(houses)):
tmp = houses_total_points[house_id]*houses_participations[house_id]/total_participations
houses_partic_points.append(tmp)
The results of this method are the following:
House Points per Participant
Gryffindor: [46 5 1 40]
Slytherin: [ 8 9 39 45 30 40 36 44 38]
Hufflepuff: [42 3 0 21 21 9 38 38]
Ravenclaw: [ 2 46]
House Number of Participations per House
Gryffindor: 4
Slytherin: 9
Hufflepuff: 8
Ravenclaw: 2
House Total Points
Gryffindor: 92
Slytherin: 289
Hufflepuff: 172
Ravenclaw: 48
House Points weighted by a participation factor
Gryffindor: 16
Slytherin: 113
Hufflepuff: 59
Ravenclaw: 4
You'll find the complete file with printing results here (https://gist.github.com/silgon/5be78b1ea0b55a20d90d9ec3e7c515e5).
You should enter some more rules to define the fairness.
Idea 1
You could set up the rule that anyone has to earn at least 10 points to enter the competition.
Then you can calculate the average points for each house.
Positive: Everyone needs to show some motivation.
Idea 2
Another approach would be to set the rule that from each house only the 10 best students will count for the competition.
Positive: Easy rule to calculate the points.
Negative: Students might become uninterested if they see they can't reach the top 10 places of their house.
From my point of view, your problem is diveded in a few points:
The best thing to do would be to re - assignate the player in the different Houses so that each House has the same number of players. (as explain by #navid-vafaei)
If you don't want to do that because you believe that it may affect your game popularity with player whom are in House that they don't want because you can change the choice of the Sorting Hat at least in the movie or books.
In that case, you can sum the point of the student's house and divide by the number of students. You may just remove the number of student with a very low score. You may remove as well the student with a very low activity because students whom skip school might be fired.
The most important part for me n your algorithm is weather or not you give points for all valuables things:
In the Harry Potter's story, the students earn point on the differents subjects they chose at school and get point according to their score.
At the end of the year, there is a special award event. At that moment, the Director gave points for valuable things which cannot be evaluated in the subject at school suche as the qualites (bravery for example).

Algorithm to give more weight to the first word

Right now, I'm trying to create an algorithm that gives a score to a user, depending on his input in a text field.
This score is supposed to encourage the user to add more text to his personal profile.
The way the algorithm should work, is that it should account a certain weight to the first word, and a little less weight to the second word. The third word will receive a little less weight than the second word, and so on.
The goal is to encourage users to expand their texts, but to avoid spam in general as well. For instance, the added value of the 500th word shouldn't be much at all.
The difference between a text of 100 words and a text of 500 words should be substantial.
Am I making any sense so far?
Right now, I wouldn't know where to begin with this question. I've tried multiple Google queries, but didn't seem to find anything of the sort. Can anyone point me in the right direction?
I suppose such an algorithm must already exist somewhere (or at least the general idea probably exists) but I can't seem to be able to find some help on the subject.
Can anyone point me in the right direction?
I'd really appreciate any help you can give me.
Thanks a lot.
// word count in user description
double word_count = ...;
// word limit over which words do not improve score
double word_limit = ...;
// use it to change score progression curve
// if factor = 1, progression is linear
// if factor < 1, progression is steeper at the beginning
// if factor > 1, progression is steeper at the end
double factor = ...;
double score = pow(min(word_count, word_limit) / word_limit, factor);
It depends how complex you want/need it to be, and whether or not you want a constant reduction in the weight applied to a particular word.
The simplest would possibly be to apply a relatively high weight (say 1000) to the first word, and then each subsequent word has a weight one less than the weight of the previous word; so the second word has a weight of 999, the third word has a weight of 998, etc. That has the "drawback" that the sum of the weights doesn't increase past the 1000 word mark - you'll have to decide for yourself whether or not that's bad for your particular situation. That may not do exactly what you need to do, though.
If you don't want a linear reduction, it could be something simple such as the first word has a weight of X, the second word has a weight equal to Y% of X, the third word has a weight equal to Y% of Y% of X, etc. The difference between the first and second word is going to be larger than the difference between the second and third word, and by the time you reach the 500th word, the difference is going to be far smaller. It's also not difficult to implement, since it's not a complex formula.
Or, if you really need to, you could use a more complex mathematical function to calculate the weight - try googling 'exponential decay' and see if that's of any use to you.
It is not very difficult to implement a custom scoring function. Here is one in pseudo code:
function GetScore( word_count )
// no points for the lazy user
if word_count == 0
return 0
// 20 points for the first word and then up to 90 points linearly:
else if word_count >= 1 and word_count <= 100
return 20 + 70 * (word_count - 1) / (100)
// 90 points for the first 100 words and then up to 100 points linearly:
else if word_count >= 101 and word_count <= 1000
return 90 + 10 * (word_count - 100) / (900)
// 100 points is the maximum for 1000 words or more:
else
return 100
end function
I would go with something like result = 2*sqrt(words_count), anyway you can use any function that has derivative less then 1 e.g. log

Resources