Percentage of a set of numbers - algorithm

I’ve been asked to make a chart that displays how many users are on each page. This chart will be color-coded, where the page with the most users will have 100% opacity, the page with the least viewers will have 40% opacity and the pages in between will have a corresponding opacity.
So, I have a set of numbers, I need
The largest number to be 100%
The smallest number to be 40%
The numbers in between to fall between 100% and 40%
The numbers can be any whole number 1 or greater.
So given 1000, 500, 100, 50
1000 should be 100%
500 should be ??
100 should be ??
50 should be 40%
Or, given 2, 1
2 should be 100%
1 should be 40%
Or given 1000, 1
1000 should be 100%
1 should be 40%
Or given 7485, 395, 3
7485 should be 100%
395 should be ??
3 should be 40%
I hope that makes sense.
What equation can I use to solve this?
I know to get the percent is just (# / largest_number) * 100 but I’m lost trying to get it between 100% and 40%
The closest I could get is ((#/largest_number) * 60) + 40, but that assumes 0 is the smallest number and gives me 43 for my smallest number in the 1st set of numbers (50) instead of 40% like I need.
Thanks in advance!

Instead of getting the percent with (# / largest_number) * 100 do x = ((# - lowest_number) / (largest_number - lowest_number) * 100
This is a basic normalization function - see here for details.
This way largest number will always map to 1 and lowest_number will always map to 0, then you can interpolate using your function (x * 60) + 40

Related

Is there an adaptation to the Marching Squares algorithm to make it lossless compression for constrained inputs?

I'm using the Marching Squares algorithm to take a lattice of values and turn them into a contour when the values exceed 50%. My values have the property that most are 0% and 100% where the transitions from 0% to 100% occurs across at most a single intervening value, such that the contour created will pass through every lattice position where the value is greater than 0% and less than 100%. For example, consider this field of values representing the approximate percentages shown in the greyscale squares of the following image:
0 0 0 0 0 0 0 0
0 0 6 71 71 20 0 0
0 28 35 100 100 48 20 0
0 100 100 100 100 100 71 0
0 100 100 100 100 100 71 0
0 9 18 100 100 35 6 0
0 0 9 100 100 28 0 0
0 0 0 0 0 0 0 0
The traditional Marching Squares algorithm would produce a contour as shown in this image:
The blue field represents the contour and the greyscale squares represent the lattice values for the above data.
Given the resulting contour, I can convert it back to a lattice of numbers again by taking the area covered by the contour for each lattice position as the recreated value for that lattice position. For the above contour, it looks like this image that shows the same contour and the resulting values converted back to a lattice of values shown by greyscale squares:
The new values are similar but not exactly the same as the original, some are larger, others are smaller, thus information has been lost and the algorithm is lossy compression. The decompressed field of values looks approximately like this:
0 0 0 0 0 0 0 0
0 0 3 67 70 4 0 0
0 12 43 100 100 59 4 0
0 91 100 100 100 100 70 0
0 88 100 100 100 100 67 0
0 4 27 100 100 43 3 0
0 0 3 88 91 12 0 0
0 0 0 0 0 0 0 0
Is there a way to adjust the linear interpolation step to not lose information, or at least come much closer to the original data field? If not, can the contour have extra points added to resolve this. For example, perhaps the interpolation step is left as is, but instead of a straight line between the points in the Marching Squares algorithm, there are extra points added along the path to force the desired area in each corner of the four lattice squares considered at each part of the Marching Steps algorithm?
In the lower right area of the example, one step of the Marching Steps algorithm finds these four values:
100 28
0 0
The interpolation produces 50% on left side and 70% on top side. This means on the left, the point A is placed exactly on the border between the 0% square in lower left and the 100% square in upper left. This means on the top, the point B is placed 70% of the way toward the center of the 28% value in upper right. The foregoing results in a diagonal line from A to B taking the upper left corner whose value is 100.
We could add additional intervening points between A and B such that the area values are not lost upon return back (decompression) from contour to lattice values. For example, consider this drawing:
The original Marching squares gives the points A and B in the drawing. The yellow highlight shows additional points X, Y, and Z that could be added so that the area covered is 100% in upper left, 0% in lower left, and 28% in upper right. For the 28%, 14% is handled below point B and 14% above point B.
Is this a known problem that has existing solutions or are there similar problems in compression of images that can be drawn upon to help solve this problem? Does the proposed solution seem reasonable or can it be simplified further? I'm concerned that it will be pretty complex to handle the four quadrants for each of the 14 variations of Marching Squares that produce lines, so if there is a way to simplify this, I'd like to find it.
In summary, would like to adjust the computation of the blue contour, such that the area of each lattice square covered by the contour matches the original data used to create the blue contour, and thus have lossless compression to convert the lattice into a contour that is perfectly reversible.

Algorithm: Fill different baskets

Let's assume I have 3 different baskets with a fixed capacity
And n-products which provide different value for each basket -- you can only pick whole products
Each product should be limited to a max amount (i.e. you can maximal pick product A 5 times)
Every product adds at least 0 or more value to all baskets and come in all kinds of variations
Now I want a list with all possible combinations of products fitting in the baskets ordered by accuracy (like basket 1 is 5% more full would be 5% less accurate)
Edit: Example
Basket A capacity 100
Basket B capacity 80
Basket C capacity 30
fake products
Product 1 (A: 5, B: 10, C: 1)
Product 2 (A: 20 B: 0, C: 0)
There might be hundreds more products
Best fit with max 5 each would be
5 times Product 1
4 times Product 2
Result
A: 105
B: 50
C: 5
Accuracy: (qty_used / max_qty) * 100 = (160 / 210) * 100 = 76.190%
Next would be another combination with less accuracy
Any pointing in the right direction is highly appreciated Thanks
Edit:
instead of above method, accuracy should be as error and the list should be in ascending order of error.
Error(Basket x) = (|max_qty(x) - qty_used(x)| / max_qty(x)) * 100
and the overall error should be the weighted average of the errors of all baskets.
Total Error = [Σ (Error(x) * max_qty(x))] / [Σ (max_qty(x))]

Find the optimum number of non uniform bins

R - Problem: to find the optimum number of non-uniform bins to show a range of data points.
I have a bunch of data points (let us assume different prices of different mobiles). I need to categorize these mobile phones into some categories (based on the price). The bin size (in this example refers to the price range) need not be uniform (there might be lots of mobiles in the low price category and few in the long tail category).
Is there any efficient algorithm to find the optimum number of bins required and the number of data points (in this case mobile phones) which shall go into each category.
This is not a standard formula, but wanted to post as it seem to work well with data set i tested.
Find the average price of all the mobiles.
Ex: 5 mobiles with prices 10, 20, 40, 80, 200
Avg is 350/5 = 70
Subtract minimum price from average price: 70 - 10 = 60 -> name it N1
Subtract avg price from Max price: 200 - 70 = 130 -> name it N2
Find the ratio N2/N1 : 130/60: Roughly 2
This indicates that it is better to have 2 bins at the lower price range for every 1 bin at higher range.
So, for example take 2 bins below 70. Range 0 - 35(2 mobiles), 36 - 70(1 mobile)
1 bin above 70: Range 71 - 200(2 mobiles)
As you can see, number of bins and bin sizes are reasonably optimal.

Basic Velocity Algorithm?

Given the following dataset for a single article on my site:
Article 1
2/1/2010 100
2/2/2010 80
2/3/2010 60
Article 2
2/1/2010 20000
2/2/2010 25000
2/3/2010 23000
where column 1 is the date and column 2 is the number of pageviews for an article. What is a basic velocity calculation that can be done to determine if this article is trending upwards or downwards for the most recent 3 days?
Caveats, the articles will not know the total number of pageviews only their own totals. Ideally with a number between 0 and 1. Any pointers to what this class of algorithms is called?
thanks!
update: Your data actually already is a list of velocities (pageviews/day). The following answer simply shows how to find the average velocity over the past three days. See my other answer for how to calculate pageview acceleration, which is the real statistic you are probably looking for.
Velocity is simply the change in a value (delta pageviews) over time:
For article 1 on 2/3/2010:
delta pageviews = 100 + 80 + 60
= 240 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 240 / 3
= 80 pageviews/day
For article 2 on 2/3/2010:
delta pageviews = 20000 + 25000 + 23000
= 68000 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 68,000 / 3
= 22,666 + 2/3 pageviews/day
Now that we know the maximum velocity, we can scale all the velocities to get relative velocities between 0 and 1 (or between 0% and 100%):
relative pageview velocity of article 1 = velocity / MAX_VELOCITY
= 240 / (22,666 + 2/3)
~ 0.0105882353
~ 1.05882353%
relative pageview velocity of article 2 = velocity / MAX_VELOCITY
= (22,666 + 2/3)/(22,666 + 2/3)
= 1
= 100%
"Pageview trend" likely refers to pageview acceleration, not velocity. Your dataset actually already is a list of velocities (pageviews/day). Pageviews are non-decreasing values, so pageview velocity can never be negative. The following describes how to calculate pageview acceleration, which may be negative.
PV_acceleration(t1,t2) = (PV_velocity{t2} - PV_velocity{t1}) / (t2 - t1)
("PV" == "Pageview")
Explanation:
Acceleration is simply change in velocity divided by change in time. Since your dataset is a list of page view velocities, you can plug them directly into the formula:
PV_acceleration("2/1/2010", "2/3/2010") = (60 - 100) / ("2/3/2010" - "2/1/2010")
= -40 / 2
= -20 pageviews per day per day
Note the data for "2/2/2010" was not used. An alternate method is to calculate three PV_accelerations (using a date range that goes back only a single day) and averaging them. There is not enough data in your example to do this for three days, but here is how to do it for the last two days:
PV_acceleration("2/3/2010", "2/2/2010") = (60 - 80) / ("2/3/2010" - "2/2/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration("2/2/2010", "2/1/2010") = (80 - 100) / ("2/2/2010" - "2/1/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration_average("2/3/2010", "2/2/2010") = -20 + -20 / 2
= -20 pageviews per day per day
This alternate method did not make a difference for the article 1 data because the page view acceleration did not change between the two days, but it will make a difference for article 2.
Just a link to an article about the 'trending' algorithm reddit, SUs and HN use among others.
http://www.seomoz.org/blog/reddit-stumbleupon-delicious-and-hacker-news-algorithms-exposed

How to optimize the layout of rectangles

I have a dynamic number of equally proportioned and sized rectangular objects that I want to optimally display on the screen. I can resize the objects but need to maintain proportion.
I know what the screen dimensions are.
How can I calculate the optimal number of rows and columns that I will need to divide the screen in to and what size I will need to scale the objects to?
Thanks,
Jamie.
Assuming that all rectangles have the same dimensions and orientation and that such should not be changed.
Let's play!
// Proportion of the screen
// w,h width and height of your rectangles
// W,H width and height of the screen
// N number of your rectangles that you would like to fit in
// ratio
r = (w*H) / (h*W)
// This ratio is important since we can define the following relationship
// nbRows and nbColumns are what you are looking for
// nbColumns = nbRows * r (there will be problems of integers)
// we are looking for the minimum values of nbRows and nbColumns such that
// N <= nbRows * nbColumns = (nbRows ^ 2) * r
nbRows = ceil ( sqrt ( N / r ) ) // r is positive...
nbColumns = ceil ( N / nbRows )
I hope I got my maths right, but that cannot be far from what you are looking for ;)
EDIT:
there is not much difference between having a ratio and the width and height...
// If ratio = w/h
r = ratio * (H/W)
// If ratio = h/w
r = H / (W * ratio)
And then you're back using 'r' to find out how much rows and columns use.
Jamie, I interpreted "optimal number of rows and columns" to mean "how many rows and columns will provide the largest rectangles, consistent with the required proportions and screen size". Here's a simple approach for that interpretation.
Each possible choice (number of rows and columns of rectangles) results in a maximum possible size of rectangle for the specified proportions. Looping over the possible choices and computing the resulting size implements a simple linear search over the space of possible solutions. Here's a bit of code that does that, using an example screen of 480 x 640 and rectangles in a 3 x 5 proportion.
def min (a, b)
a < b ? a : b
end
screenh, screenw = 480, 640
recth, rectw = 3.0, 5.0
ratio = recth / rectw
puts ratio
nrect = 14
(1..nrect).each do |nhigh|
nwide = ((nrect + nhigh - 1) / nhigh).truncate
maxh, maxw = (screenh / nhigh).truncate, (screenw / nwide).truncate
relh, relw = (maxw * ratio).truncate, (maxh / ratio).truncate
acth, actw = min(maxh, relh), min(maxw, relw)
area = acth * actw
puts ([nhigh, nwide, maxh, maxw, relh, relw, acth, actw, area].join("\t"))
end
Running that code provides the following trace:
1 14 480 45 27 800 27 45 1215
2 7 240 91 54 400 54 91 4914
3 5 160 128 76 266 76 128 9728
4 4 120 160 96 200 96 160 15360
5 3 96 213 127 160 96 160 15360
6 3 80 213 127 133 80 133 10640
7 2 68 320 192 113 68 113 7684
8 2 60 320 192 100 60 100 6000
9 2 53 320 192 88 53 88 4664
10 2 48 320 192 80 48 80 3840
11 2 43 320 192 71 43 71 3053
12 2 40 320 192 66 40 66 2640
13 2 36 320 192 60 36 60 2160
14 1 34 640 384 56 34 56 1904
From this, it's clear that either a 4x4 or 5x3 layout will produce the largest rectangles. It's also clear that the rectangle size (as a function of row count) is worst (smallest) at the extremes and best (largest) at an intermediate point. Assuming that the number of rectangles is modest, you could simply code the calculation above in your language of choice, but bail out as soon as the resulting area starts to decrease after rising to a maximum.
That's a quick and dirty (but, I hope, fairly obvious) solution. If the number of rectangles became large enough to bother, you could tweak for performance in a variety of ways:
use a more sophisticated search algorithm (partition the space and recursively search the best segment),
if the number of rectangles is growing during the program, keep the previous result and only search nearby solutions,
apply a bit of calculus to get a faster, precise, but less obvious formula.
This is almost exactly like kenneth's question here on SO. He also wrote it up on his blog.
If you scale the proportions in one dimension so that you are packing squares, it becomes the same problem.
One way I like to do that is to use the square root of the area:
Let
r = number of rectangles
w = width of display
h = height of display
Then,
A = (w * h) / r is the area per rectangle
and
L = sqrt(A) is the base length of each rectangle.
If they are not square, then just multiply accordingly to keep the same ratio.
Another way to do a similar thing is to just take the square root of the number of rectangles. That'll give you one dimension of your grid (i.e. the number of columns):
C = sqrt(n) is the number of columns in your grid
and
R = n / C is the number of rows.
Note that one of these will have to ceiling and the other floor otherwise you will truncate numbers and might miss a row.
Your mention of rows and columns suggests that you envisaged arranging the rectangles in a grid, possibly with a few spaces (e.g. some of the bottom row) unfilled. Assuming this is the case:
Suppose you scale the objects such that (an as-yet unknown number) n of them fit across the screen. Then
objectScale=screenWidth/(n*objectWidth)
Now suppose there are N objects, so there will be
nRows = ceil(N/n)
rows of objects (where ceil is the Ceiling function), which will take up
nRows*objectScale*objectHeight
of vertical height. We need to find n, and want to choose the smallest n such that this distance is smaller than screenHeight.
A simple mathematical expression for n is made trickier by the presence of the ceiling function. If the number of columns is going to be fairly small, probably the easiest way to find n is just to loop through increasing n until the inequality is satisfied.
Edit: We can start the loop with the upper bound of
floor(sqrt(N*objectHeight*screenWidth/(screenHeight*objectWidth)))
for n, and work down: the solution is then found in O(sqrt(N)). An O(1) solution is to assume that
nRows = N/n + 1
or to take
n=ceil(sqrt(N*objectHeight*screenWidth/(screenHeight*objectWidth)))
(the solution of Matthieu M.) but these have the disadvantage that the value of n may not be optimal.
Border cases occur when N=0, and when N=1 and the aspect ratio of the objects is such that objectHeight/objectWidth > screenHeight/screenWidth - both of these are easy to deal with.

Resources