How can this algorithm be expressed in programming for a calculation? - algorithm

Is it possible for someone to make this algorithm easy to understand to enter into a calculation for my application.
Thanks
To explain:

Yes that algorithm most certainly can be expressed in a programming language, however I will not provide an implementation, but some pseudocode to get you started.
So the first thing we see is x is dealing with money so lets represent that as a double
double monthlyPayment //this is X
next we see that P sub 0 is the total loan amount so lets also represent that as a double
double loanAmount // this is P sub 0
next we see that i is the interest rate so this will also be a double
double interestRate // this is i
next we see that n is the number of months that remain on the loan this is an integer
int monthsRemaining // this is n
so looking at the formula you proposed we take the following:
monthlyPayment = (loanAmount * interestRate) / 1 - (1 + interestRate) ^ (-1 * monthsRemaining)
Now without actually implementing this I do believe you can take it from here.

This is how you could represent it using Javascript:
function repayment(amount, months, interest) {
return (amount * interest) / (1 - Math.pow(1 + interest, -months));
}

Are you looking for an explanation of the equation? It deals with loans. When you take out loans, you take out a certain amount of money (P) for a certain number of months (n). In addition, to make the loan worth while to the loaner, you are charged an interest rate (i). That's a percentage that you are charged every month on the remaining loan amount.
So, you need to calculate how much you have to pay each month (x) in order to have the loan paided off in the set amount of time (n months). It's not as simple as just dividing the loan amount (P) by the number of months (n) because you also have to pay interest.
So, the equation is giving you the amount of money you have to pay each month in order to pay off the original loan plus any interest.
If you're using Java:
public double calculateMonthlyPayment(double originalLoan, double interestRate, double monthsToRepay) {
return (originalLoan*interestRate)/(1-(Math.pow((1+interestRate), -monthsToRepay)));
}

Related

Finding Cheapest Combination of Tickets

I'm writing a web application to help commuters determine the cheapest combination of tickets to purchase for their daily commute based on how many trips they take in a month.
I've done a bunch of reading on this, but I'm still not quite sure how I would implement this. I think it could be similar to the knapsack problem or some other questions on here, but I still find the answers quite confusing.
There are three ticket types:
Single Trip: $6.50
10 Trips: $49.80
Monthly Pass: $149.40
I want to be able to compute the combination of tickets that has the lowest total price which covers a given number of monthly trips for a commuter.
Example:
For 12 trips in a month, the cheapest combination is one 10 Trip pass and two Single passes for a total of $62.80.
Is anyone able to point me in the right direction (with pseudocode or otherwise) on how I would implement this? Thanks in advance! :)
Explanation
As sascha suggested in their comment, I was actually able to implement this fairly simply.
With n as the total number of trips:
The number of ten trip tickets to use is (n % 10)
The number of single tickets to use is (n / 10) using integer division
If the total cost of these tickets is greater than the cost of a monthly ticket, use a monthly ticket.
The Code
for (i = 0; i < Math.floor(totalNumTrips / 10); i++) {
ticketsToBuy.push(TicketType.TenTrip);
cost += tenTripPrice;
}
for (i = 0; i < totalNumTrips % 10; i++) {
ticketsToBuy.push(TicketType.Single);
cost += singlePrice;
}
if (monthlyPrice < cost) {
return {
tickets: [TicketType.Monthly],
cost: monthlyPrice,
};
}
return {
tickets: ticketsToBuy,
cost: cost,
};
Hopefully this is helpful to someone!

Algorithm for fair distribution of revenue

I am after an algorithm (preferably abstract or in very clear Python or PHP code) that allows for a fair distribution of revenue both in the short and in the long term, based on the following constraints:
Each incoming amount will be an integer.
Each distribution will be also an integer amount.
Each person in the distribution is set to receive a cut of the revenue expressed as a fixed fraction. These fractions can of course be put under a common denominator (think integer percentages with a denominator of 100). The sum of all numerators equals the denominator (100 in the case of percentages)
To make it fair in the short term, person i must be receiving either floor(revenue*cut[i]) or ceil(revenue*cut[i]). EDIT: Let me emphasize that ceil(revenue*cut[i]) is not necessarily equal to 1+floor(revenue*cut[i]), which is where some algorithms fail including one presented (see my comment on the answer by Andrey).
To make it fair in the long term, as payments accumulate, the actual cuts received must be as close to the theoretical cuts as possible, preferably always under 1 unit. In particular, if the total revenue is a multiple of the denominator of the common fraction, every person should have received the corresponding multiple of their numerator.
[Edit] Forgot to add that the total amount distributed every time must of course add up to the incoming amount received.
Here's an example for clarity. Say there are three people among which to distribute revenue, and one has to get 31%, another has to get 21%, and the third has to get 100-31-21=48%.
Now imagine that there's an income of 80 coins. The first person should receive either floor(80*31/100) or ceil(80*31/100) coins, the second person should receive either floor(80*21/100) or ceil(80*21/100) and the third person, either floor(80*48/100) or ceil(80*48/100).
And now imagine that there's a new income of 120 coins. Each person should again receive either the floor or the ceiling of their corresponding cuts, but since the total revenue (200) is a multiple of the denominator (100), each person should have now their exact cuts: the first person should have 62 coins, the second person should have 42 coins, and the third person should have 96 coins.
I think I have an algorithm figured out for the case of two people to distribute the revenue among. It goes like this (for 35% and 65%):
set TotalRevenue to 0
set TotalPaid[0] to 0
{ TotalPaid[1] is implied and not necessary for the algorithm }
set Cut[0] to 35
{ Cut[1] is implied and not necessary for the algorithm }
set Denominator to 100
each time a payment is received:
let Revenue be the amount received this time
set TotalRevenue = TotalRevenue + Revenue
set tmp = floor(TotalRevenue*Cut[0]/Denominator)
pay person 0 the amount of tmp - TotalPaid[0]
set TotalPaid[0] = tmp
pay person 1 the amount of TotalRevenue - TotalPaid[0]
I think that this simple algorithm meets all of my constraints, but I have not found the way to extend it to more than two people without violating at least one. Can anyone figure out an extension to multiple people (or perhaps prove its impossibility)?
I think this works:
Keep track of a totalReceived, the total amount of money that has entered the system.
When money is added to the system, add it to totalReceived.
Set person[i].newTotal = floor(totalReceived * cut[i]). Edit: had error here; thanks commenters. This distributes some of the money. Keep track of how much, to know how much "new money" is left over. To be clear, we are keeping track of how much of the total amount of money is left over, and intuitively we re-distribute the whole sum at every step, although actually nobody's fund ever goes down between one step and the next.
You need to know how to distribute the left-over integer amount of money: there should be less than numPeople amount left over. For i ranging from 0 to leftOverMoney - 1, increment person[i].newTotal by 1. Effectively we give a dollar to each of the first leftOverMoney people.
To compute how much a person "received" during this step, compute person[i].newTotal - person[i].total.
To clean/finish up, set person[i].total = person[i].newTotal.
To accomodate constraint 4, the current payment must be used as the basis for distribution. After distributing the rounded dividends to the investors, the leftover dollars may be awarded, one each, to any eligible investors. An investor is eligible if he is due a fractional share, that is, ceil(revenuecut[i]) > floor(revenuecut[i]).
Constraint 5 may be addressed by deciding who, among the eligible investors, should get the leftover dollars. Maximum long-term fairness will be most closely approached by always awarding the leftover dollars to the eligible investors who have accumulated the most total (long-term) distribution error.
The following is derived from the algorithm I used for my fund manager clients. (I always used Pascal for financial applications for forensic reasons.) I have transposed the code fragments from a database application to an array-based method in an attempt to improve the clarity.
Included in the accounting data structures is a table of investors. Included in the information about each investor is his share fraction (investor share count divided by fund total shares), amount earned to date, and amount distributed (actually paid to investor) to date.
Investors : array [0..InvestorCount-1] of record
// ... lots of information about this investor
ShareFraction : float; // >0 and <1
EarnedToDate : currency; // rounded to dollars
DistributedToDate : currency; // rounded to dollars
end;
// The sum of all the .ShareFraction in the table is exactly 1.
Some temporary data is created for these calculations and freed afterwards.
DistErrs : array [0..InvestorCount-1] of record
TotalError : float; // distribution rounding error
CurrentError : float; // distribution rounding error
InvestorIndex : integer; // link to Investors record
end;
UNDISTRIBUTED_PAYMENTS : currency;
CurrentFloat : float;
CurrentInteger : currency;
CurrentFraction : float;
TotalFloat : float;
TotalInteger : currency;
TotalFraction : float;
DD, II : integer; // array indices
FUND_PREVIOUS_PAYMENTS : currency; is a parameter, derived from the fund history. If it is not provided as a parameter it may be calculated as the sum over all Investors[].EarnedToDate.
FUND_CURRENT_PAYMENT : currency; is a parameter, derived from the current increase in fund value.
FUND_TOTAL_PAYMENTS : currency; is a parameter, derived from the current fund value. This is the fund total amount previously paid out plus the fund current payment.
These three values form a dependents system with two degrees of freedom. Any two may be provided and the third calculated from the others.
// Calculate how much each investor has earned after the fund current payment.
UNDISTRIBUTED_PAYMENTS := FUND_CURRENT_PAYMENT;
for II := 0 to InvestorCount-1 do begin
// Calculate theoretical current dividend whole and fraction parts.
CurrentFloat := FUND_CURRENT_PAYMENT * Investors[II].ShareFraction;
CurrentInteger := floor(CurrentFloat);
CurrentFraction := CurrentFloat - CurrentInteger;
// Calculate theoretical total dividend whole and fraction parts.
TotalFloat := FUND_TOTAL_PAYMENTS * Investors[II].ShareFraction;
TotalInteger := floor(TotalFloat);
TotalFraction := TotalFloat - TotalInteger;
// Distribute the current whole dollars to the investor.
Investors[II].EarnedToDate := Investors[II].EarnedToDate + CurrentInteger;
// Track total whole dollars distributed.
UNDISTRIBUTED_PAYMENTS := UNDISTRIBUTED_PAYMENTS - CurrentInteger;
// Record the fractional dollars in the errors table and link to investor.
DistErrs[II].CurrentError := CurrentFraction;
DistErrs[II].TotalError := TotalFraction;
DistErrs[II].InvestorIndex := II;
end;
At this point UNDISTRIBUTED_PAYMENTS is always less than InvestorCount.
// Sort DistErrs by .TotalError descending.
In the real world the data is kept in a database not in arrays, so sorting is done by creating an index on DistErrs using TotalError as a key.
// Distribute one dollar each to the first UNDISTRIBUTED_PAYMENTS eligible
// investors (i.e. those whose current share ceilings are higher than their
// current share floor), in order from greatest to least .TotalError.
for DD := 0 to InvestorCount-1 do begin
if UNDISTRIBUTED_PAYMENTS <= 0 then break;
if DistErrs[DD].CurrentError <> 0 then begin
II := DistErrs[DD].InvestorIndex;
Investors[II].EarnedToDate := Investors[II].EarnedToDate + 1;
UNDISTRIBUTED_PAYMENTS := UNDISTRIBUTED_PAYMENTS - 1;
end;
end;
In a subsequent process, each investor is paid the difference between his .EarnedToDate and his .DistributedToDate, and his .DistributedToDate is adjusted to reflect that payment.

Algorithm For Ranking Items

I have a list of 6500 items that I would like to trade or invest in. (Not for real money, but for a certain game.) Each item has 5 numbers that will be used to rank it among the others.
Total quantity of item traded per day: The higher this number, the better.
The Donchian Channel of the item over the last 5 days: The higher this number, the better.
The median spread of the price: The lower this number, the better.
The spread of the 20 day moving average for the item: The lower this number, the better.
The spread of the 5 day moving average for the item: The higher this number, the better.
All 5 numbers have the same 'weight', or in other words, they should all affect the final number in the with the same worth or value.
At the moment, I just multiply all 5 numbers for each item, but it doesn't rank the items the way I would them to be ranked. I just want to combine all 5 numbers into a weighted number that I can use to rank all 6500 items, but I'm unsure of how to do this correctly or mathematically.
Note: The total quantity of the item traded per day and the donchian channel are numbers that are much higher then the spreads, which are more of percentage type numbers. This is probably the reason why multiplying them all together didn't work for me; the quantity traded per day and the donchian channel had a much bigger role in the final number.
The reason people are having trouble answering this question is we have no way of comparing two different "attributes". If there were just two attributes, say quantity traded and median price spread, would (20million,50%) be worse or better than (100,1%)? Only you can decide this.
Converting everything into the same size numbers could help, this is what is known as "normalisation". A good way of doing this is the z-score which Prasad mentions. This is a statistical concept, looking at how the quantity varies. You need to make some assumptions about the statistical distributions of your numbers to use this.
Things like spreads are probably normally distributed - shaped like a normal distribution. For these, as Prasad says, take z(spread) = (spread-mean(spreads))/standardDeviation(spreads).
Things like the quantity traded might be a Power law distribution. For these you might want to take the log() before calculating the mean and sd. That is the z score is z(qty) = (log(qty)-mean(log(quantities)))/sd(log(quantities)).
Then just add up the z-score for each attribute.
To do this for each attribute you will need to have an idea of its distribution. You could guess but the best way is plot a graph and have a look. You might also want to plot graphs on log scales. See wikipedia for a long list.
You can replace each attribute-vector x (of length N = 6500) by the z-score of the vector Z(x), where
Z(x) = (x - mean(x))/sd(x).
This would transform them into the same "scale", and then you can add up the Z-scores (with equal weights) to get a final score, and rank the N=6500 items by this total score. If you can find in your problem some other attribute-vector that would be an indicator of "goodness" (say the 10-day return of the security?), then you could fit a regression model of this predicted attribute against these z-scored variables, to figure out the best non-uniform weights.
Start each item with a score of 0. For each of the 5 numbers, sort the list by that number and add each item's ranking in that sorting to its score. Then, just sort the items by the combined score.
You would usually normalize your data entries to their respective range. Since there is no fixed range for them, you'll have to use a sliding range - or, to keep it simpler, normalize them to the daily ranges.
For each day, get all entries for a given type, get the highest and the lowest of them, determine the difference between them. Let Bottom=value of the lowest, Range=difference between highest and lowest. Then you calculate for each entry (value - Bottom)/Range, which will result in something between 0.0 and 1.0. These are the numbers you can continue to work with, then.
Pseudocode (brackets replaced by indentation to make easier to read):
double maxvalues[5];
double minvalues[5];
// init arrays with any item
for(i=0; i<5; i++)
maxvalues[i] = items[0][i];
minvalues[i] = items[0][i];
// find minimum and maximum values
foreach (items as item)
for(i=0; i<5; i++)
if (minvalues[i] > item[i])
minvalues[i] = item[i];
if (maxvalues[i] < item[i])
maxvalues[i] = item[i];
// now scale them - in this case, to the range of 0 to 1.
double scaledItems[sizeof(items)][5];
double t;
foreach(i=0; i<5; i++)
double delta = maxvalues[i] - minvalues[i];
foreach(j=sizeof(items)-1; j>=0; --j)
scaledItems[j][i] = (items[j][i] - minvalues[i]) / delta;
// linear normalization
something like that. I'll be more elegant with a good library (STL, boost, whatever you have on the implementation platform), and the normalization should be in a separate function, so you can replace it with other variations like log() as the need arises.
Total quantity of item traded per day: The higher this number, the better. (a)
The Donchian Channel of the item over the last 5 days: The higher this number, the better. (b)
The median spread of the price: The lower this number, the better. (c)
The spread of the 20 day moving average for the item: The lower this number, the better. (d)
The spread of the 5 day moving average for the item: The higher this number, the better. (e)
a + b -c -d + e = "score" (higher score = better score)

What is a better way to sort by a 5 star rating?

I'm trying to sort a bunch of products by customer ratings using a 5 star system. The site I'm setting this up for does not have a lot of ratings and continue to add new products so it will usually have a few products with a low number of ratings.
I tried using average star rating but that algorithm fails when there is a small number of ratings.
Example a product that has 3x 5 star ratings would show up better than a product that has 100x 5 star ratings and 2x 2 star ratings.
Shouldn't the second product show up higher because it is statistically more trustworthy because of the larger number of ratings?
Prior to 2015, the Internet Movie Database (IMDb) publicly listed the formula used to rank their Top 250 movies list. To quote:
The formula for calculating the Top Rated 250 Titles gives a true Bayesian estimate:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
R = average for the movie (mean)
v = number of votes for the movie
m = minimum votes required to be listed in the Top 250 (currently 25000)
C = the mean vote across the whole report (currently 7.0)
For the Top 250, only votes from regular voters are considered.
It's not so hard to understand. The formula is:
rating = (v / (v + m)) * R +
(m / (v + m)) * C;
Which can be mathematically simplified to:
rating = (R * v + C * m) / (v + m);
The variables are:
R – The item's own rating. R is the average of the item's votes. (For example, if an item has no votes, its R is 0. If someone gives it 5 stars, R becomes 5. If someone else gives it 1 star, R becomes 3, the average of [1, 5]. And so on.)
C – The average item's rating. Find the R of every single item in the database, including the current one, and take the average of them; that is C. (Suppose there are 4 items in the database, and their ratings are [2, 3, 5, 5]. C is 3.75, the average of those numbers.)
v – The number of votes for an item. (To given another example, if 5 people have cast votes on an item, v is 5.)
m – The tuneable parameter. The amount of "smoothing" applied to the rating is based on the number of votes (v) in relation to m. Adjust m until the results satisfy you. And don't misinterpret IMDb's description of m as "minimum votes required to be listed" – this system is perfectly capable of ranking items with less votes than m.
All the formula does is: add m imaginary votes, each with a value of C, before calculating the average. In the beginning, when there isn't enough data (i.e. the number of votes is dramatically less than m), this causes the blanks to be filled in with average data. However, as votes accumulates, eventually the imaginary votes will be drowned out by real ones.
In this system, votes don't cause the rating to fluctuate wildly. Instead, they merely perturb it a bit in some direction.
When there are zero votes, only imaginary votes exist, and all of them are C. Thus, each item begins with a rating of C.
See also:
A demo. Click "Solve".
Another explanation of IMDb's system.
An explanation of a similar Bayesian star-rating system.
Evan Miller shows a Bayesian approach to ranking 5-star ratings:
where
nk is the number of k-star ratings,
sk is the "worth" (in points) of k stars,
N is the total number of votes
K is the maximum number of stars (e.g. K=5, in a 5-star rating system)
z_alpha/2 is the 1 - alpha/2 quantile of a normal distribution. If you want 95% confidence (based on the Bayesian posterior distribution) that the actual sort criterion is at least as big as the computed sort criterion, choose z_alpha/2 = 1.65.
In Python, the sorting criterion can be calculated with
def starsort(ns):
"""
http://www.evanmiller.org/ranking-items-with-star-ratings.html
"""
N = sum(ns)
K = len(ns)
s = list(range(K,0,-1))
s2 = [sk**2 for sk in s]
z = 1.65
def f(s, ns):
N = sum(ns)
K = len(ns)
return sum(sk*(nk+1) for sk, nk in zip(s,ns)) / (N+K)
fsns = f(s, ns)
return fsns - z*math.sqrt((f(s2, ns)- fsns**2)/(N+K+1))
For example, if an item has 60 five-stars, 80 four-stars, 75 three-stars, 20 two-stars and 25 one-stars, then its overall star rating would be about 3.4:
x = (60, 80, 75, 20, 25)
starsort(x)
# 3.3686975120774694
and you can sort a list of 5-star ratings with
sorted([(60, 80, 75, 20, 25), (10,0,0,0,0), (5,0,0,0,0)], key=starsort, reverse=True)
# [(10, 0, 0, 0, 0), (60, 80, 75, 20, 25), (5, 0, 0, 0, 0)]
This shows the effect that more ratings can have upon the overall star value.
You'll find that this formula tends to give an overall rating which is a bit
lower than the overall rating reported by sites such as Amazon, Ebay or Wal-mart
particularly when there are few votes (say, less than 300). This reflects the
higher uncertainy that comes with fewer votes. As the number of votes increases
(into the thousands) all overall these rating formulas should tend to the
(weighted) average rating.
Since the formula only depends on the frequency distribution of 5-star ratings
for the item itself, it is easy to combine reviews from multiple sources (or,
update the overall rating in light of new votes) by simply adding the frequency
distributions together.
Unlike the IMDb formula, this formula does not depend on the average score
across all items, nor an artificial minimum number of votes cutoff value.
Moreover, this formula makes use of the full frequency distribution -- not just
the average number of stars and the number of votes. And it makes sense that it
should since an item with ten 5-stars and ten 1-stars should be treated as
having more uncertainty than (and therefore not rated as highly as) an item with
twenty 3-star ratings:
In [78]: starsort((10,0,0,0,10))
Out[78]: 2.386028063783418
In [79]: starsort((0,0,20,0,0))
Out[79]: 2.795342687927806
The IMDb formula does not take this into account.
See this page for a good analysis of star-based rating systems, and this one for a good analysis of upvote-/downvote- based systems.
For up and down voting you want to estimate the probability that, given the ratings you have, the "real" score (if you had infinite ratings) is greater than some quantity (like, say, the similar number for some other item you're sorting against).
See the second article for the answer, but the conclusion is you want to use the Wilson confidence. The article gives the equation and sample Ruby code (easily translated to another language).
Well, depending on how complex you want to make it, you could have ratings additionally be weighted based on how many ratings the person has made, and what those ratings are. If the person has only made one rating, it could be a shill rating, and might count for less. Or if the person has rated many things in category a, but few in category b, and has an average rating of 1.3 out of 5 stars, it sounds like category a may be artificially weighed down by the low average score of this user, and should be adjusted.
But enough of making it complex. Let’s make it simple.
Assuming we’re working with just two values, ReviewCount and AverageRating, for a particular item, it would make sense to me to look ReviewCount as essentially being the “reliability” value. But we don’t just want to bring scores down for low ReviewCount items: a single one-star rating is probably as unreliable as a single 5 star rating. So what we want to do is probably average towards the middle: 3.
So, basically, I’m thinking of an equation something like X * AverageRating + Y * 3 = the-rating-we-want. In order to make this value come out right we need X+Y to equal 1. Also we need X to increase in value as ReviewCount increases...with a review count of 0, x should be 0 (giving us an equation of “3”), and with an infinite review count X should be 1 (which makes the equation = AverageRating).
So what are X and Y equations? For the X equation want the dependent variable to asymptotically approach 1 as the independent variable approaches infinity. A good set of equations is something like:
Y = 1/(factor^RatingCount)
and (utilizing the fact that X must be equal to 1-Y)
X = 1 – (1/(factor^RatingCount)
Then we can adjust "factor" to fit the range that we're looking for.
I used this simple C# program to try a few factors:
// We can adjust this factor to adjust our curve.
double factor = 1.5;
// Here's some sample data
double RatingAverage1 = 5;
double RatingCount1 = 1;
double RatingAverage2 = 4.5;
double RatingCount2 = 5;
double RatingAverage3 = 3.5;
double RatingCount3 = 50000; // 50000 is not infinite, but it's probably plenty to closely simulate it.
// Do the calculations
double modfactor = Math.Pow(factor, RatingCount1);
double modRating1 = (3 / modfactor)
+ (RatingAverage1 * (1 - 1 / modfactor));
double modfactor2 = Math.Pow(factor, RatingCount2);
double modRating2 = (3 / modfactor2)
+ (RatingAverage2 * (1 - 1 / modfactor2));
double modfactor3 = Math.Pow(factor, RatingCount3);
double modRating3 = (3 / modfactor3)
+ (RatingAverage3 * (1 - 1 / modfactor3));
Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
RatingAverage1, RatingCount1, modRating1));
Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
RatingAverage2, RatingCount2, modRating2));
Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
RatingAverage3, RatingCount3, modRating3));
// Hold up for the user to read the data.
Console.ReadLine();
So you don’t bother copying it in, it gives this output:
RatingAverage: 5, RatingCount: 1, Adjusted Rating: 3.67
RatingAverage: 4.5, RatingCount: 5, Adjusted Rating: 4.30
RatingAverage: 3.5, RatingCount: 50000, Adjusted Rating: 3.50
Something like that? You could obviously adjust the "factor" value as needed to get the kind of weighting you want.
You could sort by median instead of arithmetic mean. In this case both examples have a median of 5, so both would have the same weight in a sorting algorithm.
You could use a mode to the same effect, but median is probably a better idea.
If you want to assign additional weight to the product with 100 5-star ratings, you'll probably want to go with some kind of weighted mode, assigning more weight to ratings with the same median, but with more overall votes.
If you just need a fast and cheap solution that will mostly work without using a lot of computation here's one option (assuming a 1-5 rating scale)
SELECT Products.id, Products.title, avg(Ratings.score), etc
FROM
Products INNER JOIN Ratings ON Products.id=Ratings.product_id
GROUP BY
Products.id, Products.title
ORDER BY (SUM(Ratings.score)+25.0)/(COUNT(Ratings.id)+20.0) DESC, COUNT(Ratings.id) DESC
By adding in 25 and dividing by the total ratings + 20 you're basically adding 10 worst scores and 10 best scores to the total ratings and then sorting accordingly.
This does have known issues. For example, it unfairly rewards low-scoring products with few ratings (as this graph demonstrates, products with an average score of 1 and just one rating score a 1.2 while products with an average score of 1 and 1k+ ratings score closer to 1.05). You could also argue it unfairly punishes high-quality products with few ratings.
This chart shows what happens for all 5 ratings over 1-1000 ratings:
http://www.wolframalpha.com/input/?i=Plot3D%5B%2825%2Bxy%29/%2820%2Bx%29%2C%7Bx%2C1%2C1000%7D%2C%7By%2C0%2C6%7D%5D
You can see the dip upwards at the very bottom ratings, but overall it's a fair ranking, I think. You can also look at it this way:
http://www.wolframalpha.com/input/?i=Plot3D%5B6-%28%2825%2Bxy%29/%2820%2Bx%29%29%2C%7Bx%2C1%2C1000%7D%2C%7By%2C0%2C6%7D%5D
If you drop a marble on most places in this graph, it will automatically roll towards products with both higher scores and higher ratings.
Obviously, the low number of ratings puts this problem at a statistical handicap. Never the less...
A key element to improving the quality of an aggregate rating is to "rate the rater", i.e. to keep tabs of the ratings each particular "rater" has supplied (relative to others). This allows weighing their votes during the aggregation process.
Another solution, more of a cope out, is to supply the end-users with a count (or a range indication thereof) of votes for the underlying item.
One option is something like Microsoft's TrueSkill system, where the score is given by mean - 3*stddev, where the constants can be tweaked.
After look for a while, I choose the Bayesian system.
If someone is using Ruby, here a gem for it:
https://github.com/wbotelhos/rating
I'd highly recommend the book Programming Collective Intelligence by Toby Segaran (OReilly) ISBN 978-0-596-52932-1 which discusses how to extract meaningful data from crowd behaviour. The examples are in Python, but its easy enough to convert.

Algorithm to share/settle expenses among a group

I am looking forward for an algorithm for the below problem.
Problem: There will be a set of people who owe each other some money or none. Now, I need an algorithm (the best and neat) to settle expense among this group.
Person AmtSpent
------ ---------
A 400
B 1000
C 100
Total 1500
Now, expense per person is 1500/3 = 500. Meaning B to give A 100. B to give C 400. I know, I can start with the least spent amount and work forward.
Can some one point me the best one if you have.
To sum up,
Find the total expense, and expense per head.
Find the amount each owe or outstanding (-ve denote outstanding).
Start with the least +ve amount. Allocate it to the -ve amount.
Keep repeating step 3, until you run out of -ve amount.
s. Move to next bigger +ve number. Keep repeating 3 & 4 until there are +ve numbers.
Or is there any better way to do?
The best way to get back to zero state (minimum number of transactions) was covered in this question here.
I have created an Android app which solves this problem. You can input expenses during the trip, it even recommends you "who should pay next". At the end it calculates "who should send how much to whom". My algorithm calculates minimum required number of transactions and you can setup "transaction tolerance" which can reduce transactions even further (you don't care about $1 transactions) Try it out, it's called Settle Up:
https://market.android.com/details?id=cz.destil.settleup
Description of my algorithm:
I have basic algorithm which solves the problem with n-1 transactions, but it's not optimal. It works like this: From payments, I compute balance for each member. Balance is what he paid minus what he should pay. I sort members according to balance increasingly. Then I always take the poorest and richest and transaction is made. At least one of them ends up with zero balance and is excluded from further calculations. With this, number of transactions cannot be worse than n-1. It also minimizes amount of money in transactions. But it's not optimal, because it doesn't detect subgroups which can settle up internally.
Finding subgroups which can settle up internally is hard. I solve it by generating all combinations of members and checking if sum of balances in subgroup equals zero. I start with 2-pairs, then 3-pairs ... (n-1)pairs. Implementations of combination generators are available. When I find a subgroup, I calculate transactions in the subgroup using basic algorithm described above. For every found subgroup, one transaction is spared.
The solution is optimal, but complexity increases to O(n!). This looks terrible but the trick is there will be just small number of members in reality. I have tested it on Nexus One (1 Ghz procesor) and the results are: until 10 members: <100 ms, 15 members: 1 s, 18 members: 8 s, 20 members: 55 s. So until 18 members the execution time is fine. Workaround for >15 members can be to use just the basic algorithm (it's fast and correct, but not optimal).
Source code:
Source code is available inside a report about algorithm written in Czech. Source code is at the end and it's in English:
https://web.archive.org/web/20190214205754/http://www.settleup.info/files/master-thesis-david-vavra.pdf
You have described it already. Sum all the expenses (1500 in your case), divide by number of people sharing the expense (500). For each individual, deduct the contributions that person made from the individual share (for person A, deduct 400 from 500). The result is the net that person "owes" to the central pool. If the number is negative for any person, the central pool "owes" the person.
Because you have already described the solution, I don't know what you are asking.
Maybe you are trying to resolve the problem without the central pool, the "bank"?
I also don't know what you mean by "start with the least spent amount and work forward."
Javascript solution to the accepted algorithm:
const payments = {
John: 400,
Jane: 1000,
Bob: 100,
Dave: 900,
};
function splitPayments(payments) {
const people = Object.keys(payments);
const valuesPaid = Object.values(payments);
const sum = valuesPaid.reduce((acc, curr) => curr + acc);
const mean = sum / people.length;
const sortedPeople = people.sort((personA, personB) => payments[personA] - payments[personB]);
const sortedValuesPaid = sortedPeople.map((person) => payments[person] - mean);
let i = 0;
let j = sortedPeople.length - 1;
let debt;
while (i < j) {
debt = Math.min(-(sortedValuesPaid[i]), sortedValuesPaid[j]);
sortedValuesPaid[i] += debt;
sortedValuesPaid[j] -= debt;
console.log(`${sortedPeople[i]} owes ${sortedPeople[j]} $${debt}`);
if (sortedValuesPaid[i] === 0) {
i++;
}
if (sortedValuesPaid[j] === 0) {
j--;
}
}
}
splitPayments(payments);
/*
C owes B $400
C owes D $100
A owes D $200
*/
I have recently written a blog post describing an approach to solve the settlement of expenses between members of a group where potentially everybody owes everybody else, such that the number of payments needed to settle the debts is the least possible. It uses a linear programming formulation. I also show an example using a tiny R package that implements the solution.
I had to do this after a trip with my friends, here's a python3 version:
import numpy as np
import pandas as pd
# setup inputs
people = ["Athos", "Porthos", "Aramis"] # friends names
totals = [300, 150, 90] # total spent per friend
# compute matrix
total_spent = np.array(totals).reshape(-1,1)
share = total_spent / len(totals)
mat = (share.T - share).clip(min=0)
# create a readable dataframe
column_labels = [f"to_{person}" for person in people]
index_labels = [f"{person}_owes" for person in people]
df = pd.DataFrame(data=mat, columns=column_labels, index=index_labels)
df.round(2)
Returns this dataframe:
to_Athos
to_Porthos
to_Aramis
Athos_owes
0
0
0
Porthos_owes
50
0
0
Aramis_owes
70
20
0
Read it like: "Porthos owes $50 to Athos" ....
This isn't the optimized version, this is the simple version, but it's simple code and may work in many situations.
I'd like to make a suggestion to change the core parameters, from a UX-standpoint if you don't mind terribly.
Whether its services or products being expensed amongst a group, sometimes these things can be shared. For example, an appetizer, or private/semi-private sessions at a conference.
For things like an appetizer party tray, it's sort of implied that everyone has access but not necessarily that everyone had it. To charge each person to split the expense when say, only 30% of the people partook can cause contention when it comes to splitting the bill. Other groups of people might not care at all. So from an algorithm standpoint, you need to first decide which of these three choices will be used, probably per-expense:
Universally split
Split by those who partook, evenly
Split by proportion per-partaker
I personally prefer the second one in-general because it has the utility to handle whole-expense-ownership for expenses only used by one person, some of the people, and the whole group too. It also remedies the ethical question of proportional differences with a blanket generalization of, if you partook, you're paying an even split regardless of how much you actually personally had. As a social element, I would consider someone who had a "small sample" of something just to try it and then decided not to have anymore as a justification to remove that person from the people splitting the expense.
So, small-sampling != partaking ;)
Then you take each expense and iterate through the group of who partook in what, and atomically handle each of those items, and at the end provide a total per-person.
So in the end, you take your list of expenses and iterate through them with each person. At the end of the individual expense check, you take the people who partook and apply an even split of that expense to each person, and update each person's current split of the bill.
Pardon the pseudo-code:
list_of_expenses[] = getExpenseList()
list_of_agents_to_charge[] = getParticipantList()
for each expense in list_of_expenses
list_of_partakers[] = getPartakerList(expense)
for each partaker in list_of_partakers
addChargeToAgent(expense.price / list_of_partakers.size, list_of_agents_to_charge[partaker])
Then just iterate through your list_of_agents_to_charge[] and report each total to each agent.
You can add support for a tip by simply treating the tip like an additional expense to your list of expenses.
Straightforward, as you do in your text:
Returns expenses to be payed by everybody in the original array.
Negativ values: this person gets some back
Just hand whatever you owe to the next in line and then drop out. If you get some, just wait for the second round. When done, reverse the whole thing. After these two round everybody has payed the same amount.
procedure SettleDepth(Expenses: array of double);
var
i: Integer;
s: double;
begin
//Sum all amounts and divide by number of people
// O(n)
s := 0.0;
for i := Low(Expenses) to High(Expenses) do
s := s + Expenses[i];
s := s / (High(Expenses) - Low(Expenses));
// Inplace Change to owed amount
// and hand on what you owe
// drop out if your even
for i := High(Expenses) downto Low(Expenses)+1 do begin
Expenses[i] := s - Expenses[i];
if (Expenses[i] > 0) then begin
Expenses[i-1] := Expenses[i-1] + Expenses[i];
Expenses.Delete(i);
end else if (Expenses[i] = 0) then begin
Expenses.Delete(i);
end;
end;
Expenses[Low(Expenses)] := s - Expenses[Low(Expenses)];
if (Expenses[Low(Expenses)] = 0) then begin
Expenses.Delete(Low(Expenses));
end;
// hand on what you owe
for i := Low(Expenses) to High(Expenses)-1 do begin
if (Expenses[i] > 0) then begin
Expenses[i+1] := Expenses[i+1] + Expenses[i];
end;
end;
end;
The idea (similar to what is asked but with a twist/using a bit of ledger concept) is to use Pool Account where for each bill, members either pay to the Pool or get from the Pool.
e.g.
in below attached image, the Costco expenses are paid by Mr P and needs $93.76 from Pool and other members pay $46.88 to the pool.
There are obviously better ways to do it. But that would require running a NP time complexity algorithm which could really show down your application. Anyways, this is how I implemented the solution in java for my android application using Priority Queues:
class calculateTransactions {
public static void calculateBalances(debtors,creditors) {
// add members who are owed money to debtors priority queue
// add members who owe money to others to creditors priority queue
}
public static void calculateTransactions() {
results.clear(); // remove previously calculated transactions before calculating again
PriorityQueue<Balance> debtors = new PriorityQueue<>(members.size(),new BalanceComparator()); // debtors are members of the group who are owed money, balance comparator defined for max priority queue
PriorityQueue<Balance> creditors = new PriorityQueue<>(members.size(),new BalanceComparator()); // creditors are members who have to pay money to the group
calculateBalances(debtors,creditors);
/*Algorithm: Pick the largest element from debtors and the largest from creditors. Ex: If debtors = {4,3} and creditors={2,7}, pick 4 as the largest debtor and 7 as the largest creditor.
* Now, do a transaction between them. The debtor with a balance of 4 receives $4 from the creditor with a balance of 7 and hence, the debtor is eliminated from further
* transactions. Repeat the same thing until and unless there are no creditors and debtors.
*
* The priority queues help us find the largest creditor and debtor in constant time. However, adding/removing a member takes O(log n) time to perform it.
* Optimisation: This algorithm produces correct results but the no of transactions is not minimum. To minimize it, we could use the subset sum algorithm which is a NP problem.
* The use of a NP solution could really slow down the app! */
while(!creditors.isEmpty() && !debtors.isEmpty()) {
Balance rich = creditors.peek(); // get the largest creditor
Balance poor = debtors.peek(); // get the largest debtor
if(rich == null || poor == null) {
return;
}
String richName = rich.name;
BigDecimal richBalance = rich.balance;
creditors.remove(rich); // remove the creditor from the queue
String poorName = poor.name;
BigDecimal poorBalance = poor.balance;
debtors.remove(poor); // remove the debtor from the queue
BigDecimal min = richBalance.min(poorBalance);
// calculate the amount to be send from creditor to debtor
richBalance = richBalance.subtract(min);
poorBalance = poorBalance.subtract(min);
HashMap<String,Object> values = new HashMap<>(); // record the transaction details in a HashMap
values.put("sender",richName);
values.put("recipient",poorName);
values.put("amount",currency.charAt(5) + min.toString());
results.add(values);
// Consider a member as settled if he has an outstanding balance between 0.00 and 0.49 else add him to the queue again
int compare = 1;
if(poorBalance.compareTo(new BigDecimal("0.49")) == compare) {
// if the debtor is not yet settled(has a balance between 0.49 and inf) add him to the priority queue again so that he is available for further transactions to settle up his debts
debtors.add(new Balance(poorBalance,poorName));
}
if(richBalance.compareTo(new BigDecimal("0.49")) == compare) {
// if the creditor is not yet settled(has a balance between 0.49 and inf) add him to the priority queue again so that he is available for further transactions
creditors.add(new Balance(richBalance,richName));
}
}
}
}
I've created a React App that implements Bin-packing approach to split trip expenses among friends with least number of transactions.
Check out the TypeScript file SplitPaymentCalculator.ts implementing the same.
You can find the working app's link on the homepage of the repo.

Resources