MFCC feature extraction, Librosa - feature-extraction

I want to extract mfcc features of an audio file sampled at 8000 Hz with the frame size of 20 ms and of 10 ms overlap. What must be the parameters for librosa.feature.mfcc() function. Does the code written below specify 20ms chunks with 10ms overlap?
import librosa as l
x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc = 24, hop_length = 160)
The audio file is of 1800 seconds. Does that mean I would get 24 mfccs for all (1800/0.01)-1 chunks of the audio?

1800 seconds at 8000 Hz are obviously 1800 * 8000 = 14400000 samples.
If your hop length is 160, you get roughly 14400000 / 160 = 90000 MFCC values with 24 dimensions each. So this is clearly not (1800 / 0.01) - 1 = 179999 (off by a factor of roughly 2).
Note that I used roughly in my calculation, because I only used the hop length and ignored the window length. Hop length is the number of samples the window is moved with each step. How many hops you can fit depends on whether you pad somehow or not. And if you decide not to pad, the number of frames also depends on your window size.
To get back to your question: You have to ask yourself how many samples are 10 ms?
If 1 s contains 8000 samples (that's what 8000 Hz means), how many samples are in 0.01 s? That's 8000 * 0.01 = 80 samples.
This means you have a hop length of 80 samples and a window length of 160 samples (0.02 s—twice as long).
Now you should tell librosa to use this info, like this:
import librosa as l
x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
n_fft = int(sr * 0.02) # window length: 0.02 s
hop_length = n_fft // 2 # usually one specifies the hop length as a fraction of the window length
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc=24, hop_length=hop_length, n_fft=n_fft)
# check the dimensions
print(mfccs.shape)
Hope this helps.

Related

Iterating to figure out combinations of variable number of elements

I'm making a program that uses dynamic programming to decide how to distribute some files (Movies) among DVDs so that it uses the least number of DVDs.
After much thought I decided that a good way to do it is to look at every possible combination of movies that is less than 4.38 GB (The actual size of a DVD ) and pick the largest one (i.e. the one that wastes the least space) and remove the movies in the most efficient combination and repeat until it run out of movies.
The problem is that I don't know how to loop so I can figure out every possible combination, given that movies vary in size, so a specific number of nested loops cannot be used.
pseudo-code:
Some kind of loop:
best_combination=[]
best_combination_size=0
if current_combination<4.38 and current_combination>best_combination_size:
best_combination=current_combination
best_combination_size=best_combination_size
print(best_combination)
delete best_combination from list_of_movies
first time to post a question..so go easy on me guys!!
Thanks in advance
P.S. I figured out a way to do it using Dijkstra's which I think would be fast but not memory friendly. If anybody is interested i would gladly discuss it.
You should really stick to common bin-packing heuristics. The wikipedia article gives a good overview of approaches including links to problem-tailored exact approaches. But always keep in mind: it's an np-complete problem!
I will show you some example supporting my hint, that you should stick to heuristics.
The following python-code:
creates parameterized random-problems (normal-distribution on multiple means/stds; acceptance-sampling to make sure that no movie is bigger than a DVD)
uses some random binpacking-library which implements some greedy-heuristic (i didn't try or test this lib before; so no guarantees!; no idea which heuristic is used)
uses a naive mixed-integer programming implementation (which is solved by a commercial solver; open-source solvers like cbc struggle, but might be used for good approximate solutions)
Code
import numpy as np
from cvxpy import *
from time import time
""" Generate some test-data """
np.random.seed(1)
N = 150 # movies
means = [700, 1400, 4300]
stds = [100, 300, 500]
DVD_SIZE = 4400
movies = []
for movie in range(N):
while True:
random_mean_index = np.random.randint(low=0, high=len(means))
random_size = np.random.randn() * stds[random_mean_index] + means[random_mean_index]
if random_size <= DVD_SIZE:
movies.append(random_size)
break
""" HEURISTIC SOLUTION """
import binpacking
start = time()
bins = binpacking.to_constant_volume(movies, DVD_SIZE)
end = time()
print('Heuristic solution: ')
for b in bins:
print(b)
print('used bins: ', len(bins))
print('used time (seconds): ', end-start)
""" Preprocessing """
movie_sizes_sorted = sorted(movies)
max_movies_per_dvd = 0
occupied = 0
for i in range(N):
if occupied + movie_sizes_sorted[i] <= DVD_SIZE:
max_movies_per_dvd += 1
occupied += movie_sizes_sorted[i]
else:
break
""" Solve problem """
# Variables
X = Bool(N, N) # N * number-DVDS
I = Bool(N) # indicator: DVD used
# Constraints
constraints = []
# (1) DVDs not overfilled
for dvd in range(N):
constraints.append(sum_entries(mul_elemwise(movies, X[:, dvd])) <= DVD_SIZE)
# (2) All movies distributed exactly once
for movie in range(N):
constraints.append(sum_entries(X[movie, :]) == 1)
# (3) Indicators
for dvd in range(N):
constraints.append(sum_entries(X[:, dvd]) <= I[dvd] * (max_movies_per_dvd + 1))
# Objective
objective = Minimize(sum_entries(I))
# Problem
problem = Problem(objective, constraints)
start = time()
problem.solve(solver=GUROBI, MIPFocus=1, verbose=True)
#problem.solve(solver=CBC, CliqueCuts=True)#, GomoryCuts=True, KnapsackCuts=True, verbose=True)#, GomoryCuts=True, MIRCuts=True, ProbingCuts=True,
#CliqueCuts=True, FlowCoverCuts=True, LiftProjectCuts=True,
#verbose=True)
end = time()
""" Print solution """
for dvd in range(N):
movies_ = []
for movie in range(N):
if np.isclose(X.value[movie, dvd], 1):
movies_.append(movies[movie])
if movies_:
print('DVD')
for movie in movies_:
print(' movie with size: ', movie)
print('Distributed ', N, ' movies to ', int(objective.value), ' dvds')
print('Optimizatio took (seconds): ', end-start)
Partial Output
Heuristic solution:
-------------------
('used bins: ', 60)
('used time (seconds): ', 0.0045168399810791016)
MIP-approach:
-------------
Root relaxation: objective 2.142857e+01, 1921 iterations, 0.10 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 21.42857 0 120 106.00000 21.42857 79.8% - 0s
H 0 0 68.0000000 21.42857 68.5% - 0s
H 0 0 63.0000000 21.42857 66.0% - 0s
0 0 21.42857 0 250 63.00000 21.42857 66.0% - 1s
H 0 0 62.0000000 21.42857 65.4% - 2s
0 0 21.42857 0 256 62.00000 21.42857 65.4% - 2s
0 0 21.42857 0 304 62.00000 21.42857 65.4% - 2s
0 0 21.42857 0 109 62.00000 21.42857 65.4% - 3s
0 2 21.42857 0 108 62.00000 21.42857 65.4% - 4s
40 2 27.61568 20 93 62.00000 27.61568 55.5% 110 5s
H 156 10 61.0000000 58.00000 4.92% 55.3 8s
262 4 59.00000 84 61 61.00000 59.00000 3.28% 44.2 10s
413 81 infeasible 110 61.00000 59.00000 3.28% 37.2 15s
H 417 78 60.0000000 59.00000 1.67% 36.9 15s
1834 1212 59.00000 232 40 60.00000 59.00000 1.67% 25.7 20s
...
...
57011 44660 infeasible 520 60.00000 59.00000 1.67% 27.1 456s
57337 44972 59.00000 527 34 60.00000 59.00000 1.67% 27.1 460s
58445 45817 59.00000 80 94 60.00000 59.00000 1.67% 26.9 466s
59387 46592 59.00000 340 65 60.00000 59.00000 1.67% 26.8 472s
Analysis
Some observations regarding the example above:
the heuristic obtains some solution of value 60 instantly
the commercial-solver needs more time but also find a solution of value 60 (15 secs)
also tries to find a better solution or proof that there is no one (MIP-solvers are complete = find optimal solution or proof there is none given infinite time!)
no progress for some time!
but: we got proof, that there is at best a solution of size 59
= maybe you will save one DVD by solving the problem optimally; but it's hard to find a solution and we don't know if this solution exists (yet)!
Remarks
The observations above are heavily dependent on data-statistics
It's easy to try other problems (maybe smaller) where the commercial MIP-solver finds a solution which uses 1 less DVD (e.g. 49 vs. 50)
It's not worth it (remember: open-source solvers are struggling even more)
The formulation is very simple and not tuned at all (don't blame only the solvers)
There are exact algorithms (which might be much more complex to implement) which could be appropriate
Conclusion
Heuristics are very easy to implement and provide very good solutions in general. Most of these also come with very good theoretical guarantees (e.g. at most 11/9 opt + 1 #DVDs compared to optimal solution are used = first fit decreasing heuristic). Despite the fact, that i'm keen on optimization in general, i probably would use the heuristics-approach here.
The general problem is also very popular, so that there should exist some good library for this problem in many programming-languages!
Without claiming, that the solution this answer presents is optimized, optimal or possesses any other remarkable qualities, here a greedy approach to the dvd packing problem.
import System.Random
import Data.List
import Data.Ord
-- F# programmers are so used to this operator, I just cannot live without it ...yet.
(|>) a b = b a
data Dvd = Dvd { duration :: Int, movies :: [Int] } deriving (Show,Eq)
dvdCapacity = 1000 :: Int -- a dvd has capacity for 1000 time units - arbitrary unit
-- the duration of a movie is between 1 and 1000 time units
r = randomRs (1,1000) (mkStdGen 42) :: [Int]
-- our test data set of 1000 movies, random movie durations drawn from r
allMovies = zip [1..1000] (take 1000 r)
allMoviesSorted = reverse $ sortBy (comparing snd) allMovies
remainingCapacity dvd = dvdCapacity - duration dvd
emptyDvd = Dvd { duration = 0, movies = [] }
-- from the remaining movies, pick the largest one with at most maxDuration length.
pickLargest remaining maxDuration =
let (left,right) = remaining |> break (\ (a,b) -> b <= maxDuration)
(h,t) = case right of
[] -> (Nothing,[])
otherwise -> (Just (head right), right |> tail)
in
(h,[left, t] |> concat)
-- add a track (movie) to a dvd
addTrack dvd track =
Dvd {duration = (duration dvd) + snd track,
movies = fst track : (movies dvd) }
-- pick dvd from dvds with largest remaining capacity
-- and add the largest remaining fitting track
greedyPack movies dvds
| movies == [] = (dvds,[])
| otherwise =
let dvds' = reverse $ sortBy (comparing remainingCapacity) dvds
(picked,movies') =
case dvds' of
[] -> (Nothing, movies)
(x:xs) -> pickLargest movies (remainingCapacity x)
in
case picked of
Nothing ->
-- None of the current dvds had enough capacity remaining
-- tp pick another movie and add it. -> Add new empty dvd
-- and run greedyPack again.
greedyPack movies' (emptyDvd : dvds')
Just p ->
-- The best fitting movie could be added to the
-- dvd with the largest remaining capacity.
greedyPack movies' (addTrack (head dvds') p : tail dvds')
(result,residual) = greedyPack allMoviesSorted [emptyDvd]
usedDvdsCount = length result
totalPlayTime = allMovies |> foldl (\ s (i,d) -> s + d) 0
optimalDvdsCount = round $ 0.5 + fromIntegral totalPlayTime / fromIntegral dvdCapacity
solutionQuality = length result - optimalDvdsCount
Compared to the theoretical optimal dvd count it wastes 4 extra dvds on the given data set.

Why average loss goes up when training using Vowpal Wabbit

I tried to use VW to train a regression model on a small set of examples (about 3112). I think I'm doing it correctly, yet it showed me weird results. Dug around but didn't find anything helpful.
$ cat sh600000.feat | vw --l1 1e-8 --l2 1e-8 --readable_model model -b 24 --passes 10 --cache_file cache
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.040000 0.040000 1 1.0 -0.2000 0.0000 79
0.051155 0.062310 2 2.0 0.2000 -0.0496 79
0.046606 0.042056 4 4.0 0.4100 0.1482 79
0.052160 0.057715 8 8.0 0.0200 0.0021 78
0.064936 0.077711 16 16.0 -0.1800 0.0547 77
0.060507 0.056079 32 32.0 0.0000 0.3164 79
0.136933 0.213358 64 64.0 -0.5900 -0.0850 79
0.151692 0.166452 128 128.0 0.0700 0.0060 79
0.133965 0.116238 256 256.0 0.0900 -0.0446 78
0.179995 0.226024 512 512.0 0.3700 -0.0217 79
0.109296 0.038597 1024 1024.0 0.1200 -0.0728 79
0.579360 1.049425 2048 2048.0 -0.3700 -0.0084 79
0.485389 0.485389 4096 4096.0 1.9600 0.3934 79 h
0.517748 0.550036 8192 8192.0 0.0700 0.0334 79 h
finished run
number of examples per pass = 2847
passes used = 5
weighted example sum = 14236
weighted label sum = -155.98
average loss = 0.490685 h
best constant = -0.0109567
total feature number = 1121506
$ wc model
41 48 657 model
Questions:
Why is the number of features in the output (readable) model less than the number of actual features? I counted that the training data contains 78 features (plus the bias that's 79 as shown during the training). The number of feature bits is 24, which should be far more than enough to avoid collision.
Why does the average loss actually go up in the training as you can see in the above example?
(Minor) I tried to increase the number of feature bits to 32, and it output an empty model. Why?
EDIT:
I tried to shuffle the input file, as well as using --holdout_off, as suggested. But the result is still almost the same - the average loss go up.
$ cat sh600000.feat.shuf | vw --l1 1e-8 --l2 1e-8 --readable_model model -b 24 --passes 10 --cache_file cache --holdout_off
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.040000 0.040000 1 1.0 -0.2000 0.0000 79
0.051155 0.062310 2 2.0 0.2000 -0.0496 79
0.046606 0.042056 4 4.0 0.4100 0.1482 79
0.052160 0.057715 8 8.0 0.0200 0.0021 78
0.071332 0.090504 16 16.0 0.0300 0.1203 79
0.043720 0.016108 32 32.0 -0.2200 -0.1971 78
0.142895 0.242071 64 64.0 0.0100 -0.1531 79
0.158564 0.174232 128 128.0 0.0500 -0.0439 79
0.150691 0.142818 256 256.0 0.3200 0.1466 79
0.197050 0.243408 512 512.0 0.2300 -0.0459 79
0.117398 0.037747 1024 1024.0 0.0400 0.0284 79
0.636949 1.156501 2048 2048.0 1.2500 -0.0152 79
0.363364 0.089779 4096 4096.0 0.1800 0.0071 79
0.477569 0.591774 8192 8192.0 -0.4800 0.0065 79
0.411068 0.344567 16384 16384.0 0.0700 0.0450 77
finished run
number of examples per pass = 3112
passes used = 10
weighted example sum = 31120
weighted label sum = -105.5
average loss = 0.423404
best constant = -0.0033901
total feature number = 2451800
The training examples are unique to each other so I doubt there is over-fitting problem (which, as I understand it, usually happens when the number of input is too small comparing the number of features).
EDIT2:
Tried to print the average loss for every pass of examples, and see that it mostly remains constant.
$ cat dist/sh600000.feat | vw --l1 1e-8 --l2 1e-8 -f dist/model -P 3112 --passes 10 -b 24 --cache_file dist/cache
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
final_regressor = dist/model
using cache_file = dist/cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.498822 0.498822 3112 3112.0 0.0800 0.0015 79 h
0.476677 0.454595 6224 6224.0 -0.2200 -0.0085 79 h
0.466413 0.445856 9336 9336.0 0.0200 -0.0022 79 h
0.490221 0.561506 12448 12448.0 0.0700 -0.1113 79 h
finished run
number of examples per pass = 2847
passes used = 5
weighted example sum = 14236
weighted label sum = -155.98
average loss = 0.490685 h
best constant = -0.0109567
total feature number = 1121506
Also another try without the --l1, --l2 and -b parameters:
$ cat dist/sh600000.feat | vw -f dist/model -P 3112 --passes 10 --cache_file dist/cacheNum weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
final_regressor = dist/model
using cache_file = dist/cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.520286 0.520286 3112 3112.0 0.0800 -0.0021 79 h
0.488581 0.456967 6224 6224.0 -0.2200 -0.0137 79 h
0.474247 0.445538 9336 9336.0 0.0200 -0.0299 79 h
0.496580 0.563450 12448 12448.0 0.0700 -0.1727 79 h
0.533413 0.680958 15560 15560.0 -0.1700 0.0322 79 h
0.524531 0.480201 18672 18672.0 -0.9800 -0.0573 79 h
finished run
number of examples per pass = 2801
passes used = 7
weighted example sum = 19608
weighted label sum = -212.58
average loss = 0.491739 h
best constant = -0.0108415
total feature number = 1544713
Does that mean it's normal for average loss to go up during one pass, but as long as multiple pass gets the same loss then it's fine?
Model file stores only non-zero weights. So most likely others got nulled especially if you are using --l1
It may be caused by many reasons. Perhaps your dataset isn't shuffled well enough. If you sort your dataset so examples labeled -1 will be in first half and examples labeled 1 will be in second then your model will show very good convergence on first half, but you'll see avg loss bump as it reaches 2nd half. So it may be disbalance in dataset. As for last two losses - these are holdout losses (marked with 'h' at end of line) and may point that model is overfitted. Pls refer to my other answer.
Well, in master branch usage of -b 32 is even currently blocked. You shall use up to -b 31. On practice -b 24-28 is usually enough even for dozens of thousands of features.
I would recommend you to get up-to-date VW version from github

Direct Table & Lookup Table

How to measure memory size of an image in direct coding 24-bit RGB color model & in 24-bit 256-entry loop-up table representation. For example: Given an image of resolution 800*600. how much spaces are required to save the image using direct coding and look-up table.
For a regular 24-bit RGB representation most probably you just have to multiply the number of pixel on number of bytes per pixel. 24 bits = 3 bytes, so the size is 800 * 600 * 3 bytes = 1440000 bytes ≈ 1.37 MiB. In some cases you may have rows of an image aligned on some boundary in memory, usually 4 or 8 or 32 bytes. But since 800 is divisible by 32, this will not change anything, still 1.37 MiB.
Now, for a look-up table, you have 1 byte per pixel, since you have only to address one entry in the table. This yields 800 * 600 * 1 = 480000 bytes ≈ 0.46 MiB. Plus the table itself: 256 colors, 24 bits (3 bytes) each - 256 * 3 = 768 bytes. Negligible comparing to the size of the image.

Basic Velocity Algorithm?

Given the following dataset for a single article on my site:
Article 1
2/1/2010 100
2/2/2010 80
2/3/2010 60
Article 2
2/1/2010 20000
2/2/2010 25000
2/3/2010 23000
where column 1 is the date and column 2 is the number of pageviews for an article. What is a basic velocity calculation that can be done to determine if this article is trending upwards or downwards for the most recent 3 days?
Caveats, the articles will not know the total number of pageviews only their own totals. Ideally with a number between 0 and 1. Any pointers to what this class of algorithms is called?
thanks!
update: Your data actually already is a list of velocities (pageviews/day). The following answer simply shows how to find the average velocity over the past three days. See my other answer for how to calculate pageview acceleration, which is the real statistic you are probably looking for.
Velocity is simply the change in a value (delta pageviews) over time:
For article 1 on 2/3/2010:
delta pageviews = 100 + 80 + 60
= 240 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 240 / 3
= 80 pageviews/day
For article 2 on 2/3/2010:
delta pageviews = 20000 + 25000 + 23000
= 68000 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 68,000 / 3
= 22,666 + 2/3 pageviews/day
Now that we know the maximum velocity, we can scale all the velocities to get relative velocities between 0 and 1 (or between 0% and 100%):
relative pageview velocity of article 1 = velocity / MAX_VELOCITY
= 240 / (22,666 + 2/3)
~ 0.0105882353
~ 1.05882353%
relative pageview velocity of article 2 = velocity / MAX_VELOCITY
= (22,666 + 2/3)/(22,666 + 2/3)
= 1
= 100%
"Pageview trend" likely refers to pageview acceleration, not velocity. Your dataset actually already is a list of velocities (pageviews/day). Pageviews are non-decreasing values, so pageview velocity can never be negative. The following describes how to calculate pageview acceleration, which may be negative.
PV_acceleration(t1,t2) = (PV_velocity{t2} - PV_velocity{t1}) / (t2 - t1)
("PV" == "Pageview")
Explanation:
Acceleration is simply change in velocity divided by change in time. Since your dataset is a list of page view velocities, you can plug them directly into the formula:
PV_acceleration("2/1/2010", "2/3/2010") = (60 - 100) / ("2/3/2010" - "2/1/2010")
= -40 / 2
= -20 pageviews per day per day
Note the data for "2/2/2010" was not used. An alternate method is to calculate three PV_accelerations (using a date range that goes back only a single day) and averaging them. There is not enough data in your example to do this for three days, but here is how to do it for the last two days:
PV_acceleration("2/3/2010", "2/2/2010") = (60 - 80) / ("2/3/2010" - "2/2/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration("2/2/2010", "2/1/2010") = (80 - 100) / ("2/2/2010" - "2/1/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration_average("2/3/2010", "2/2/2010") = -20 + -20 / 2
= -20 pageviews per day per day
This alternate method did not make a difference for the article 1 data because the page view acceleration did not change between the two days, but it will make a difference for article 2.
Just a link to an article about the 'trending' algorithm reddit, SUs and HN use among others.
http://www.seomoz.org/blog/reddit-stumbleupon-delicious-and-hacker-news-algorithms-exposed

How to optimize the layout of rectangles

I have a dynamic number of equally proportioned and sized rectangular objects that I want to optimally display on the screen. I can resize the objects but need to maintain proportion.
I know what the screen dimensions are.
How can I calculate the optimal number of rows and columns that I will need to divide the screen in to and what size I will need to scale the objects to?
Thanks,
Jamie.
Assuming that all rectangles have the same dimensions and orientation and that such should not be changed.
Let's play!
// Proportion of the screen
// w,h width and height of your rectangles
// W,H width and height of the screen
// N number of your rectangles that you would like to fit in
// ratio
r = (w*H) / (h*W)
// This ratio is important since we can define the following relationship
// nbRows and nbColumns are what you are looking for
// nbColumns = nbRows * r (there will be problems of integers)
// we are looking for the minimum values of nbRows and nbColumns such that
// N <= nbRows * nbColumns = (nbRows ^ 2) * r
nbRows = ceil ( sqrt ( N / r ) ) // r is positive...
nbColumns = ceil ( N / nbRows )
I hope I got my maths right, but that cannot be far from what you are looking for ;)
EDIT:
there is not much difference between having a ratio and the width and height...
// If ratio = w/h
r = ratio * (H/W)
// If ratio = h/w
r = H / (W * ratio)
And then you're back using 'r' to find out how much rows and columns use.
Jamie, I interpreted "optimal number of rows and columns" to mean "how many rows and columns will provide the largest rectangles, consistent with the required proportions and screen size". Here's a simple approach for that interpretation.
Each possible choice (number of rows and columns of rectangles) results in a maximum possible size of rectangle for the specified proportions. Looping over the possible choices and computing the resulting size implements a simple linear search over the space of possible solutions. Here's a bit of code that does that, using an example screen of 480 x 640 and rectangles in a 3 x 5 proportion.
def min (a, b)
a < b ? a : b
end
screenh, screenw = 480, 640
recth, rectw = 3.0, 5.0
ratio = recth / rectw
puts ratio
nrect = 14
(1..nrect).each do |nhigh|
nwide = ((nrect + nhigh - 1) / nhigh).truncate
maxh, maxw = (screenh / nhigh).truncate, (screenw / nwide).truncate
relh, relw = (maxw * ratio).truncate, (maxh / ratio).truncate
acth, actw = min(maxh, relh), min(maxw, relw)
area = acth * actw
puts ([nhigh, nwide, maxh, maxw, relh, relw, acth, actw, area].join("\t"))
end
Running that code provides the following trace:
1 14 480 45 27 800 27 45 1215
2 7 240 91 54 400 54 91 4914
3 5 160 128 76 266 76 128 9728
4 4 120 160 96 200 96 160 15360
5 3 96 213 127 160 96 160 15360
6 3 80 213 127 133 80 133 10640
7 2 68 320 192 113 68 113 7684
8 2 60 320 192 100 60 100 6000
9 2 53 320 192 88 53 88 4664
10 2 48 320 192 80 48 80 3840
11 2 43 320 192 71 43 71 3053
12 2 40 320 192 66 40 66 2640
13 2 36 320 192 60 36 60 2160
14 1 34 640 384 56 34 56 1904
From this, it's clear that either a 4x4 or 5x3 layout will produce the largest rectangles. It's also clear that the rectangle size (as a function of row count) is worst (smallest) at the extremes and best (largest) at an intermediate point. Assuming that the number of rectangles is modest, you could simply code the calculation above in your language of choice, but bail out as soon as the resulting area starts to decrease after rising to a maximum.
That's a quick and dirty (but, I hope, fairly obvious) solution. If the number of rectangles became large enough to bother, you could tweak for performance in a variety of ways:
use a more sophisticated search algorithm (partition the space and recursively search the best segment),
if the number of rectangles is growing during the program, keep the previous result and only search nearby solutions,
apply a bit of calculus to get a faster, precise, but less obvious formula.
This is almost exactly like kenneth's question here on SO. He also wrote it up on his blog.
If you scale the proportions in one dimension so that you are packing squares, it becomes the same problem.
One way I like to do that is to use the square root of the area:
Let
r = number of rectangles
w = width of display
h = height of display
Then,
A = (w * h) / r is the area per rectangle
and
L = sqrt(A) is the base length of each rectangle.
If they are not square, then just multiply accordingly to keep the same ratio.
Another way to do a similar thing is to just take the square root of the number of rectangles. That'll give you one dimension of your grid (i.e. the number of columns):
C = sqrt(n) is the number of columns in your grid
and
R = n / C is the number of rows.
Note that one of these will have to ceiling and the other floor otherwise you will truncate numbers and might miss a row.
Your mention of rows and columns suggests that you envisaged arranging the rectangles in a grid, possibly with a few spaces (e.g. some of the bottom row) unfilled. Assuming this is the case:
Suppose you scale the objects such that (an as-yet unknown number) n of them fit across the screen. Then
objectScale=screenWidth/(n*objectWidth)
Now suppose there are N objects, so there will be
nRows = ceil(N/n)
rows of objects (where ceil is the Ceiling function), which will take up
nRows*objectScale*objectHeight
of vertical height. We need to find n, and want to choose the smallest n such that this distance is smaller than screenHeight.
A simple mathematical expression for n is made trickier by the presence of the ceiling function. If the number of columns is going to be fairly small, probably the easiest way to find n is just to loop through increasing n until the inequality is satisfied.
Edit: We can start the loop with the upper bound of
floor(sqrt(N*objectHeight*screenWidth/(screenHeight*objectWidth)))
for n, and work down: the solution is then found in O(sqrt(N)). An O(1) solution is to assume that
nRows = N/n + 1
or to take
n=ceil(sqrt(N*objectHeight*screenWidth/(screenHeight*objectWidth)))
(the solution of Matthieu M.) but these have the disadvantage that the value of n may not be optimal.
Border cases occur when N=0, and when N=1 and the aspect ratio of the objects is such that objectHeight/objectWidth > screenHeight/screenWidth - both of these are easy to deal with.

Resources