How do you build a ratings implementation? - algorithm

We have need for a "rating" system in a project we are working on, similar to the one in SO. However, in ours there are multiple entities that need to be "tagged" with a vote up (only up, never down, like an increment). Sometimes we will need to show all of the entities in order of what is rated highest, regardless of entity type, basically mixing the result sets, I guess. What data structures / algorithms do you use to implement this so that is flexible and still scalable?

Since reddit's ranking algorithm rocks, it makes very much sense to have a look at it, if not copy it:
Given the time the entry was posted A and the time of 7:46:43 a.m. December 8, 2005 B we have ts as their difference in seconds:
ts = A - B
and x as the difference between the number of up votes U and the number of down votes D:
x = U - D
Where
y = 1 if x > 0
y = 0 if x = 0
y = -1 if x < 0
and z as the maximal value of the absolute value of x and 1:
z = |x| if |x| >= 1
z = 1 if |x| < 1
we have the rating as a function ƒ(ts, y, z):
ƒ(ts, y, z) = log10 z + (y • ts)/45000

Related

Solving Bitwise Equation

I have a bitwise equation of the form
X = (A & X) + (B & X)
where A and B are known integers and X is unknown, How do I find X?
Here, & is Bitwise AND and + is Arithmetic addition, A, B, and X are integers.
One of the trivial solutions is zero but I have to return that if no other solution is possible.
My approach: I know the range of X so I could iterate over it in O(n) to check for the condition but the range could be very large so it might not be efficient.
Also, I tried doing AND operations on both sides to shorten the equation but cannot come to a meaningful solution.
Let's begin by focusing on just one bit of X, the very last bit. It can be either 0 or 1, and depending on how A and B are structured, we may be able to rule certain options out. There are four combinations of the last bits of A and B, but there's really only three cases to consider because of symmetry:
Case 1: A and B end in zero. In that case, A & X ends in 0 and B & X ends in 0. Therefore, since X = A & X + B & X, the last bit of X must be 0.
Case 2: One of A and B ends in 1 and the other ends in 0. Assume, without loss of generality, that A ends in 1 and B ends in 0. Then A & X + B & X = 0 + X = X, so either choice of bit for the last bit of X works.
Case 3: A and B end in 1. In that case, A & X ends with the last bit of X and B & X ends with the last bit of X. Then the last bit of X is given by A & X + B & X = X + X = 2X = 0, since multiplying any bit by two and looking at the lowest resulting bit gives 0.
Stated differently, in each case for the combination of A and B bits, we can determine what bit(s) are possible for X by consulting a table and then move one position to the right to process the next bit. The table, specifically, is shown here
A | B | X
---+---+---
0 | 0 | 0
0 | 1 | any
1 | 0 | any
1 | 1 | 0
Note that this matches your intuition that zero is always a solution, since these rules allow you to pick 0 for any bit that you'd like. But if you'd like to find a solution that isn't 0 everywhere, just fill in 1s any time you have a choice.
As an example, suppose A in binary is 011101001 and B in binary is 001101010. Then, using this table, we have these options:
A 011101001
B 001101010
X 0*00000*0
That gives four possibilities:
010000010
010000000
000000010
000000000
And we can check that, indeed, each of these is a solution to X = A & X + B & X.
This solution runs in time O(b), where b is the number of bits in the numbers A and B. That's O(log A + log B), if you're given A and B numerically, which means that this is way faster than a brute-force search.
Hope this helps!

Code not working with bigger values of for loops

I am implementing a Szudik's pairing function in Matlab, where i pair 2 values coming from 2 different matrices X and Y, into a unique value given by the function 'CantorPairing2D(X,Y), After this i reverse the process to check for it's invertibility given by the function 'InverseCantorPairing2( X )'. But I seem to get an unusual problem, when i check this function for small matrices of size say 10*10, it works fine, but the for my code i have to use a 256 *256 matrices A and B, and then the code goes wrong, actually what it gives is a bit strange, because when i invert the process, the values in the matrix A, are same as cvalues of B in some places, for instance A(1,1)=B(1,1), and A(1,2)=B(1,2). Can somebody help.
VRNEW=CantorPairing2D(VRPRO,BLOCK3);
function [ Z ] = CantorPairing2D( X,Y )
[a,~] =(size(X));
Z=zeros(a,a);
for i=1:a
for j=1:a
if( X(i,j)~= (max(X(i,j),Y(i,j))) )
Z(i,j)= X(i,j)+(Y(i,j))^2;
else
Z(i,j)= (X(i,j))^2+X(i,j)+Y(i,j);
end
end
end
Z=Z./1000;
end
function [ A,B ] = InverseCantorPairing2( X )
[a, ~] =(size(X));
Rfinal=X.*1000;
A=zeros(a,a);
B=zeros(a,a);
for i=1:a
for j=1:a
if( ( Rfinal(i,j)- (floor( sqrt(Rfinal(i,j))))^2) < floor(sqrt(Rfinal(i,j))) )
T=floor(sqrt(Rfinal(i,j)));
B(i,j)=T;
A(i,j)=Rfinal(i,j)-T^2;
else
T=floor( (-1+sqrt(1+4*Rfinal(i,j)))/2 );
A(i,j)=T;
B(i,j)=Rfinal(i,j)-T^2-T;
end
end
end
end
Example if A= 45 16 7 17
7 22 11 25
11 12 9 17
2 11 3 5
B= 0 0 0 1
0 0 0 1
1 1 1 1
1 3 0 0
Then after pairing i get
C =2.0700 0.2720 0.0560 0.3070
1.4060 0.5060 0.1320 0.6510
0.1330 0.1570 0.0910 0.3070
0.0070 0.1350 0.0120 0.0300
after the inverse pairing i should get the same A and same B. But for bigger matrices it is giving unusual behaviour, because some elements of A are same as B.
If possible it would help immensely a counter example where your code does fail.
I got to reproduce your code behaviour and I have rewritten your code in a vectorised fashion. You should get the bug, but hopefully it is a first step to uncover the underlying logic and find the bug itself.
I am not familiar with the specific algorithm, but I observe a discrepancy in the CantorPairing definition.
for elements where Y = X your if statement would be false, since X = max(X,X); so for those elements your Z would be X^2+X+Y, but for hypothesis X =Y, therefore your would have:
X^2+X+X = X^2+2*X;
now, if we perturb slightly the equation and suppose Y = X + 10*eps, your if statement would be true (since Y > X) and your Z would be X + Y ^2; since X ~=Y we can approximate to X + X^2
therefore your equation is very temperamental to numerical approximation ( and you definitely have a discontinuity in Z). Again, I am not familiar with the algorithm and it may very well be the behaviour you want, but it is unlikely: so I am pointing this out.
Following is my version of your code, I report it also because I hope it will be pedagogical in getting you acquainted with logical indexing and vectorized code (which is the idiomatic form for MATLAB, let alone much faster than nested for loops).
function [ Z ] = CantorPairing2D( X,Y )
[a,~] =(size(X));
Z=zeros(a,a);
firstConditionIndeces = Y > X; % if Y > X then X is not the max between Y and X
% update elements on which to apply first equation
Z(firstConditionIndeces) = X(firstConditionIndeces) + Y(firstConditionIndeces).^2;
% update elements on the remaining elements
Z(~firstConditionIndeces) = X(~firstConditionIndeces).^2 + X(~firstConditionIndeces) + Y(~firstConditionIndeces) ;
Z=Z./1000;
end
function [ A,B ] = InverseCantorPairing2( X )
[a, ~] =(size(X));
Rfinal=X.*1000;
A=zeros(a,a);
B=zeros(a,a);
T = zeros(a,a) ;
% condition deciding which updates to be applied
indecesToWhichApplyFstFcn = Rfinal- (floor( sqrt(Rfinal )))^2 < floor(sqrt(Rfinal)) ;
% elements on which to apply the first update
T(indecesToWhichApplyFstFcn) = floor(sqrt(Rfinal )) ;
B(indecesToWhichApplyFstFcn) = floor(Rfinal(indecesToWhichApplyFstFcn)) ;
A(indecesToWhichApplyFstFcn) = Rfinal(indecesToWhichApplyFstFcn) - T(indecesToWhichApplyFstFcn).^2;
% updates on which to apply the remaining elements
A(~indecesToWhichApplyFstFcn) = floor( (-1+sqrt(1+4*Rfinal(~indecesToWhichApplyFstFcn )))/2 ) ;
B(~indecesToWhichApplyFstFcn) = Rfinal(~indecesToWhichApplyFstFcn) - T(~indecesToWhichApplyFstFcn).^2 - T(~indecesToWhichApplyFstFcn) ;
end

Can we write an algorithm which gives me two whole numbers X and Y when I want to get a desired fraction F such that F= X/Y?

I am working to prepare a test data set in which I have to check rounding. Suppose I want to check round, round_up and round_down is working correctly at 10 th decimal place or not.
Then
if, X=100 and Y = 54 so, X/Y = 1.8518518518518518518518518518519 (test round equidistant)
if, X= 10 and Y = 7 so, 1.4285714285714285714285714285714 (test round_up)
if, X= 10 and Y = 3 so, 3.3333333333333333333333333333333 (test round_down)
Can we write an algorithm in which
input will be rounding mode (round_up, round, round_down) and decimal place I want to round at(in our example 10)
output will be X and Y like above?
If the required location is p (=10 in your example), then y=10^p and then you can choose any x you want.
Depending on the language you are using, p might be too big for you to do 10^p, so in the worst case just divide the result from x/y by 10, 100 or whatever is necessary.
Or you can do like this
# n = number of fraction you want to return
def getFraction(a, b, n):
result = ""
for i in range(n):
f = int((a % b) * 10 / b)
result += str(f)
a = a * 10 - b * f
return result
getFraction(10, 7, 11) # return 42857142857 which 10/7 = 1.42857142857...
What I do is like what you have learnt in elementary school on how to do division by pen and paper.
Actually, if the required digit is d, then if d is not 9, the answer would be x=d,y=9 regardless of p which is the position of the digit. If d is 9, then if p is odd, the answer is x=10,y=11 and if p is even, x=1,y=11. If a trivial answer for d=0 won't do, the mirror answer for d=9 is suitable, that is, if d=0 and p is odd, the answer is x=1,y=11, and if p is even, x=10,y=11. A lot shorter than an answer with y=10^p and certainly fitting in nearly any architecture.

Psuedo-Random Variable

I have a variable, between 0 and 1, which should dictate the likelyhood that a second variable, a random number between 0 and 1, is greater than 0.5. In other words, if I were to generate the second variable 1000 times, the average should be approximately equal to the first variable's value. How do I make this code?
Oh, and the second variable should always be capable of producing either 0 or 1 in any condition, just more or less likely depending on the value of the first variable. Here is a link to a graph which models approximately how I would like the program to behave. Each equation represents a separate value for the first variable.
You have a variable p and you are looking for a mapping function f(x) that maps random rolls between x in [0, 1] to the same interval [0, 1] such that the expected value, i.e. the average of all rolls, is p.
You have chosen the function prototype
f(x) = pow(x, c)
where c must be chosen appropriately. If x is uniformly distributed in [0, 1], the average value is:
int(f(x) dx, [0, 1]) == p
With the integral:
int(pow(x, c) dx) == pow(x, c + 1) / (c + 1) + K
one gets:
c = 1/p - 1
A different approach is to make p the median value of the distribution, such that half of the rolls fall below p, the other half above p. This yields a different distribution. (I am aware that you didn't ask for that.) Now, we have to satisfy the condition:
f(0.5) == pow(0.5, c) == p
which yields:
c = log(p) / log(0.5)
With the current function prototype, you cannot satisfy both requirements. Your function is also asymmetric (f(x, p) != f(1-x, 1-p)).
Python functions below:
def medianrand(p):
"""Random number between 0 and 1 whose median is p"""
c = math.log(p) / math.log(0.5)
return math.pow(random.random(), c)
def averagerand(p):
"""Random number between 0 and 1 whose expected value is p"""
c = 1/p - 1
return math.pow(random.random(), c)
You can do this by using a dummy. First set the first variable to a value between 0 and 1. Then create a random number in the dummy between 0 and 1. If this dummy is bigger than the first variable, you generate a random number between 0 and 0.5, and otherwise you generate a number between 0.5 and 1.
In pseudocode:
real a = 0.7
real total = 0.0
for i between 0 and 1000 begin
real dummy = rand(0,1)
real b
if dummy > a then
b = rand(0,0.5)
else
b = rand(0.5,1)
end if
total = total + b
end for
real avg = total / 1000
Please note that this algorithm will generate average values between 0.25 and 0.75. For a = 1 it will only generate random values between 0.5 and 1, which should average to 0.75. For a=0 it will generate only random numbers between 0 and 0.5, which should average to 0.25.
I've made a sort of pseudo-solution to this problem, which I think is acceptable.
Here is the algorithm I made;
a = 0.2 # variable one
b = 0 # variable two
b = random.random()
b = b^(1/(2^(4*a-1)))
It doesn't actually produce the average results that I wanted, but it's close enough for my purposes.
Edit: Here's a graph I made that consists of a large amount of datapoints I generated with a python script using this algorithm;
import random
mod = 6
div = 100
for z in xrange(div):
s = 0
for i in xrange (100000):
a = (z+1)/float(div) # variable one
b = random.random() # variable two
c = b**(1/(2**((mod*a*2)-mod)))
s += c
print str((z+1)/float(div)) + "\t" + str(round(s/100000.0, 3))
Each point in the table is the result of 100000 randomly generated points from the algorithm; their x positions being the a value given, and their y positions being their average. Ideally they would fit to a straight line of y = x, but as you can see they fit closer to an arctan equation. I'm trying to mess around with the algorithm so that the averages fit the line, but I haven't had much luck as of yet.

Algorithm to price bulk discounts

i am designing a Chinese auction website.
Tickets ($5, $10 & $20) are sold either individually, or via packages to receive discounts.
There are various Ticket packages for example:
5-$5 tickets = receive 10% off
5-$10 tickets = receive 10% off
5-$20 tickets = receive 10% off
5-$5 tickets + 5-$10 tickets + 5-$20 tickets = receive 15% off
When users add tickets to their cart, i need to figure out the cheapest package(s) to give them. the trick is that if a user adds 4-$5 tickets + 5-$10 tickets + 5-$20 tickets, it should still give him package #4 since that would be the cheapest for him.
Any help in figuring out a algorithm to solve this, or any tips would be greatly appreciate it.
thanks
EDIT
i figured out the answer, thanks all, but the code is long.
i will post the answer code if anyone still is interested.
After selling the customer as many complete packages as possible, we are left with some residual N of tickets desired of each of the 3 types ($5, $10, $20). In the example you gave, the quantities desired range from 0 to 5 (6 possible values). Thus, there are only 214 possible residual combinations (6 ** 3 - 2; minus 2 because the combinations 0-0-0 and 5-5-5 are degenerate). Just pre-compute the price of each combination as though it were purchased without package 4; compare that calcuation to the cost of package 4 ($148.75); this will tell you the cheapest approach for every combination.
Is the actual number of packages so large that a complete pre-computation wouldn't be a viable approach?
One approach is dynamic programming.
The idea is that if the buyer wants x of item A, y of item B, and z of item C, then you should compute for all triples (x', y', z') with 0 <= x' <= x and 0 <= y' <= y and 0 <= z' <= z the cheapest way to obtain at least x' of A, y' of B, and z' of C. Pseudocode:
for x' = 0 to x
for y' = 0 to y
for z' = 0 to z
cheapest[(x', y', z')] = min over all packages p of (price(p) + cheapest[residual demand after buying p])
next_package[(x', y', z')] = the best package p
Then you can work backward from (x, y, z) adding to the cart the packages indicated by next_package.
If there are many different kinds of items or there are many of each item, branch and bound may be a better choice.
First, calculate how many full Package 4s you need. Get them out of the way.
full_package_4_count = min(x, y, z) mod 5.
x = x - 5 * full_package_4_count
y = y - 5 * full_package_4_count
z = z - 5 * full_package_4_count
Now, there may still be worth buying some more Package 4s, even though they didn't actually want to buy that many tickets.
How many of them could there be?
partial_package_4_max = (max(x, y, z) + 4) mod 5
Now loop to try each of these out:
best_price = 10000000
for partial_package_4_count = 0 to partial_package_4_max:
-- Calculate how much we have already spent.
price = (full_package_4_count + partial_package_4_count) * 175 * (1-0.15)
-- Work out how many additional tickets we want.
x' = max(0, x - 5 * partial_package_count)
y' = max(0, y - 5 * partial_package_count)
z' = max(0, z - 5 * partial_package_count)
--- Add cost for additional tickets (with a 10% discount for every pack of 5)
price = price + x' mod 5 * 25 * (1-0.10) + x' div 5 * 5
price = price + y' mod 5 * 50 * (1-0.10) + x' div 5 * 10
price = price + y' mod 5 * 100 * (1-0.10) + x' div 5 * 20
if price < best_price
best_price = price
-- Should record other details about the current deal here too.

Resources