I'm making a program that uses dynamic programming to decide how to distribute some files (Movies) among DVDs so that it uses the least number of DVDs.
After much thought I decided that a good way to do it is to look at every possible combination of movies that is less than 4.38 GB (The actual size of a DVD ) and pick the largest one (i.e. the one that wastes the least space) and remove the movies in the most efficient combination and repeat until it run out of movies.
The problem is that I don't know how to loop so I can figure out every possible combination, given that movies vary in size, so a specific number of nested loops cannot be used.
pseudo-code:
Some kind of loop:
best_combination=[]
best_combination_size=0
if current_combination<4.38 and current_combination>best_combination_size:
best_combination=current_combination
best_combination_size=best_combination_size
print(best_combination)
delete best_combination from list_of_movies
first time to post a question..so go easy on me guys!!
Thanks in advance
P.S. I figured out a way to do it using Dijkstra's which I think would be fast but not memory friendly. If anybody is interested i would gladly discuss it.
You should really stick to common bin-packing heuristics. The wikipedia article gives a good overview of approaches including links to problem-tailored exact approaches. But always keep in mind: it's an np-complete problem!
I will show you some example supporting my hint, that you should stick to heuristics.
The following python-code:
creates parameterized random-problems (normal-distribution on multiple means/stds; acceptance-sampling to make sure that no movie is bigger than a DVD)
uses some random binpacking-library which implements some greedy-heuristic (i didn't try or test this lib before; so no guarantees!; no idea which heuristic is used)
uses a naive mixed-integer programming implementation (which is solved by a commercial solver; open-source solvers like cbc struggle, but might be used for good approximate solutions)
Code
import numpy as np
from cvxpy import *
from time import time
""" Generate some test-data """
np.random.seed(1)
N = 150 # movies
means = [700, 1400, 4300]
stds = [100, 300, 500]
DVD_SIZE = 4400
movies = []
for movie in range(N):
while True:
random_mean_index = np.random.randint(low=0, high=len(means))
random_size = np.random.randn() * stds[random_mean_index] + means[random_mean_index]
if random_size <= DVD_SIZE:
movies.append(random_size)
break
""" HEURISTIC SOLUTION """
import binpacking
start = time()
bins = binpacking.to_constant_volume(movies, DVD_SIZE)
end = time()
print('Heuristic solution: ')
for b in bins:
print(b)
print('used bins: ', len(bins))
print('used time (seconds): ', end-start)
""" Preprocessing """
movie_sizes_sorted = sorted(movies)
max_movies_per_dvd = 0
occupied = 0
for i in range(N):
if occupied + movie_sizes_sorted[i] <= DVD_SIZE:
max_movies_per_dvd += 1
occupied += movie_sizes_sorted[i]
else:
break
""" Solve problem """
# Variables
X = Bool(N, N) # N * number-DVDS
I = Bool(N) # indicator: DVD used
# Constraints
constraints = []
# (1) DVDs not overfilled
for dvd in range(N):
constraints.append(sum_entries(mul_elemwise(movies, X[:, dvd])) <= DVD_SIZE)
# (2) All movies distributed exactly once
for movie in range(N):
constraints.append(sum_entries(X[movie, :]) == 1)
# (3) Indicators
for dvd in range(N):
constraints.append(sum_entries(X[:, dvd]) <= I[dvd] * (max_movies_per_dvd + 1))
# Objective
objective = Minimize(sum_entries(I))
# Problem
problem = Problem(objective, constraints)
start = time()
problem.solve(solver=GUROBI, MIPFocus=1, verbose=True)
#problem.solve(solver=CBC, CliqueCuts=True)#, GomoryCuts=True, KnapsackCuts=True, verbose=True)#, GomoryCuts=True, MIRCuts=True, ProbingCuts=True,
#CliqueCuts=True, FlowCoverCuts=True, LiftProjectCuts=True,
#verbose=True)
end = time()
""" Print solution """
for dvd in range(N):
movies_ = []
for movie in range(N):
if np.isclose(X.value[movie, dvd], 1):
movies_.append(movies[movie])
if movies_:
print('DVD')
for movie in movies_:
print(' movie with size: ', movie)
print('Distributed ', N, ' movies to ', int(objective.value), ' dvds')
print('Optimizatio took (seconds): ', end-start)
Partial Output
Heuristic solution:
-------------------
('used bins: ', 60)
('used time (seconds): ', 0.0045168399810791016)
MIP-approach:
-------------
Root relaxation: objective 2.142857e+01, 1921 iterations, 0.10 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 21.42857 0 120 106.00000 21.42857 79.8% - 0s
H 0 0 68.0000000 21.42857 68.5% - 0s
H 0 0 63.0000000 21.42857 66.0% - 0s
0 0 21.42857 0 250 63.00000 21.42857 66.0% - 1s
H 0 0 62.0000000 21.42857 65.4% - 2s
0 0 21.42857 0 256 62.00000 21.42857 65.4% - 2s
0 0 21.42857 0 304 62.00000 21.42857 65.4% - 2s
0 0 21.42857 0 109 62.00000 21.42857 65.4% - 3s
0 2 21.42857 0 108 62.00000 21.42857 65.4% - 4s
40 2 27.61568 20 93 62.00000 27.61568 55.5% 110 5s
H 156 10 61.0000000 58.00000 4.92% 55.3 8s
262 4 59.00000 84 61 61.00000 59.00000 3.28% 44.2 10s
413 81 infeasible 110 61.00000 59.00000 3.28% 37.2 15s
H 417 78 60.0000000 59.00000 1.67% 36.9 15s
1834 1212 59.00000 232 40 60.00000 59.00000 1.67% 25.7 20s
...
...
57011 44660 infeasible 520 60.00000 59.00000 1.67% 27.1 456s
57337 44972 59.00000 527 34 60.00000 59.00000 1.67% 27.1 460s
58445 45817 59.00000 80 94 60.00000 59.00000 1.67% 26.9 466s
59387 46592 59.00000 340 65 60.00000 59.00000 1.67% 26.8 472s
Analysis
Some observations regarding the example above:
the heuristic obtains some solution of value 60 instantly
the commercial-solver needs more time but also find a solution of value 60 (15 secs)
also tries to find a better solution or proof that there is no one (MIP-solvers are complete = find optimal solution or proof there is none given infinite time!)
no progress for some time!
but: we got proof, that there is at best a solution of size 59
= maybe you will save one DVD by solving the problem optimally; but it's hard to find a solution and we don't know if this solution exists (yet)!
Remarks
The observations above are heavily dependent on data-statistics
It's easy to try other problems (maybe smaller) where the commercial MIP-solver finds a solution which uses 1 less DVD (e.g. 49 vs. 50)
It's not worth it (remember: open-source solvers are struggling even more)
The formulation is very simple and not tuned at all (don't blame only the solvers)
There are exact algorithms (which might be much more complex to implement) which could be appropriate
Conclusion
Heuristics are very easy to implement and provide very good solutions in general. Most of these also come with very good theoretical guarantees (e.g. at most 11/9 opt + 1 #DVDs compared to optimal solution are used = first fit decreasing heuristic). Despite the fact, that i'm keen on optimization in general, i probably would use the heuristics-approach here.
The general problem is also very popular, so that there should exist some good library for this problem in many programming-languages!
Without claiming, that the solution this answer presents is optimized, optimal or possesses any other remarkable qualities, here a greedy approach to the dvd packing problem.
import System.Random
import Data.List
import Data.Ord
-- F# programmers are so used to this operator, I just cannot live without it ...yet.
(|>) a b = b a
data Dvd = Dvd { duration :: Int, movies :: [Int] } deriving (Show,Eq)
dvdCapacity = 1000 :: Int -- a dvd has capacity for 1000 time units - arbitrary unit
-- the duration of a movie is between 1 and 1000 time units
r = randomRs (1,1000) (mkStdGen 42) :: [Int]
-- our test data set of 1000 movies, random movie durations drawn from r
allMovies = zip [1..1000] (take 1000 r)
allMoviesSorted = reverse $ sortBy (comparing snd) allMovies
remainingCapacity dvd = dvdCapacity - duration dvd
emptyDvd = Dvd { duration = 0, movies = [] }
-- from the remaining movies, pick the largest one with at most maxDuration length.
pickLargest remaining maxDuration =
let (left,right) = remaining |> break (\ (a,b) -> b <= maxDuration)
(h,t) = case right of
[] -> (Nothing,[])
otherwise -> (Just (head right), right |> tail)
in
(h,[left, t] |> concat)
-- add a track (movie) to a dvd
addTrack dvd track =
Dvd {duration = (duration dvd) + snd track,
movies = fst track : (movies dvd) }
-- pick dvd from dvds with largest remaining capacity
-- and add the largest remaining fitting track
greedyPack movies dvds
| movies == [] = (dvds,[])
| otherwise =
let dvds' = reverse $ sortBy (comparing remainingCapacity) dvds
(picked,movies') =
case dvds' of
[] -> (Nothing, movies)
(x:xs) -> pickLargest movies (remainingCapacity x)
in
case picked of
Nothing ->
-- None of the current dvds had enough capacity remaining
-- tp pick another movie and add it. -> Add new empty dvd
-- and run greedyPack again.
greedyPack movies' (emptyDvd : dvds')
Just p ->
-- The best fitting movie could be added to the
-- dvd with the largest remaining capacity.
greedyPack movies' (addTrack (head dvds') p : tail dvds')
(result,residual) = greedyPack allMoviesSorted [emptyDvd]
usedDvdsCount = length result
totalPlayTime = allMovies |> foldl (\ s (i,d) -> s + d) 0
optimalDvdsCount = round $ 0.5 + fromIntegral totalPlayTime / fromIntegral dvdCapacity
solutionQuality = length result - optimalDvdsCount
Compared to the theoretical optimal dvd count it wastes 4 extra dvds on the given data set.
Related
I have a list of items, a, b, c,..., each of which has a weight and a value.
The 'ordinary' Knapsack algorithm will find the selection of items that maximises the value of the selected items, whilst ensuring that the weight is below a given constraint.
The problem I have is slightly different. I wish to minimise the value (easy enough by using the reciprocal of the value), whilst ensuring that the weight is at least the value of the given constraint, not less than or equal to the constraint.
I have tried re-routing the idea through the ordinary Knapsack algorithm, but this can't be done. I was hoping there is another combinatorial algorithm that I am not aware of that does this.
In the german wiki it's formalized as:
finite set of objects U
w: weight-function
v: value-function
w: U -> R
v: U -> R
B in R # constraint rhs
Find subset K in U subject to:
sum( w(u) <= B ) | all w in K
such that:
max sum( v(u) ) | all u in K
So there is no restriction like nonnegativity.
Just use negative weights, negative values and a negative B.
The basic concept is:
sum( w(u) ) <= B | all w in K
<->
-sum( w(u) ) >= -B | all w in K
So in your case:
classic constraint: x0 + x1 <= B | 3 + 7 <= 12 Y | 3 + 10 <= 12 N
becomes: -x0 - x1 <= -B |-3 - 7 <=-12 N |-3 - 10 <=-12 Y
So for a given implementation it depends on the software if this is allowed. In terms of the optimization-problem, there is no problem. The integer-programming formulation for your case is as natural as the classic one (and bounded).
Python Demo based on Integer-Programming
Code
import numpy as np
import scipy.sparse as sp
from cylp.cy import CyClpSimplex
np.random.seed(1)
""" INSTANCE """
weight = np.random.randint(50, size = 5)
value = np.random.randint(50, size = 5)
capacity = 50
""" SOLVE """
n = weight.shape[0]
model = CyClpSimplex()
x = model.addVariable('x', n, isInt=True)
model.objective = value # MODIFICATION: default = minimize!
model += sp.eye(n) * x >= np.zeros(n) # could be improved
model += sp.eye(n) * x <= np.ones(n) # """
model += np.matrix(-weight) * x <= -capacity # MODIFICATION
cbcModel = model.getCbcModel()
cbcModel.logLevel = True
status = cbcModel.solve()
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int) # assumes existence
print("INSTANCE")
print(" weights: ", weight)
print(" values: ", value)
print(" capacity: ", capacity)
print("Solution")
print(x_sol)
print("sum weight: ", x_sol.dot(weight))
print("value: ", x_sol.dot(value))
Small remarks
This code is just a demo using a somewhat low-level like library and there are other tools available which might be better suited (e.g. windows: pulp)
it's the classic integer-programming formulation from wiki modifies as mentioned above
it will scale very well as the underlying solver is pretty good
as written, it's solving the 0-1 knapsack (only variable bounds would need to be changed)
Small look at the core-code:
# create model
model = CyClpSimplex()
# create one variable for each how-often-do-i-pick-this-item decision
# variable needs to be integer (or binary for 0-1 knapsack)
x = model.addVariable('x', n, isInt=True)
# the objective value of our IP: a linear-function
# cylp only needs the coefficients of this function: c0*x0 + c1*x1 + c2*x2...
# we only need our value vector
model.objective = value # MODIFICATION: default = minimize!
# WARNING: typically one should always use variable-bounds
# (cylp problems...)
# workaround: express bounds lower_bound <= var <= upper_bound as two constraints
# a constraint is an affine-expression
# sp.eye creates a sparse-diagonal with 1's
# example: sp.eye(3) * x >= 5
# 1 0 0 -> 1 * x0 + 0 * x1 + 0 * x2 >= 5
# 0 1 0 -> 0 * x0 + 1 * x1 + 0 * x2 >= 5
# 0 0 1 -> 0 * x0 + 0 * x1 + 1 * x2 >= 5
model += sp.eye(n) * x >= np.zeros(n) # could be improved
model += sp.eye(n) * x <= np.ones(n) # """
# cylp somewhat outdated: need numpy's matrix class
# apart from that it's just the weight-constraint as defined at wiki
# same affine-expression as above (but only a row-vector-like matrix)
model += np.matrix(-weight) * x <= -capacity # MODIFICATION
# internal conversion of type neeeded to treat it as IP (or else it would be
LP)
cbcModel = model.getCbcModel()
cbcModel.logLevel = True
status = cbcModel.solve()
# type-casting
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int)
Output
Welcome to the CBC MILP Solver
Version: 2.9.9
Build Date: Jan 15 2018
command line - ICbcModel -solve -quit (default strategy 1)
Continuous objective value is 4.88372 - 0.00 seconds
Cgl0004I processed model has 1 rows, 4 columns (4 integer (4 of which binary)) and 4 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of 5
Cbc0038I Before mini branch and bound, 4 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.00 seconds)
Cbc0038I After 0.00 seconds - Feasibility pump exiting with objective of 5 - took 0.00 seconds
Cbc0012I Integer solution of 5 found by feasibility pump after 0 iterations and 0 nodes (0.00 seconds)
Cbc0001I Search completed - best objective 5, took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from 5 to 5
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Result - Optimal solution found
Objective value: 5.00000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.00
Time (Wallclock seconds): 0.00
Total time (CPU seconds): 0.00 (Wallclock seconds): 0.00
INSTANCE
weights: [37 43 12 8 9]
values: [11 5 15 0 16]
capacity: 50
Solution
[0 1 0 1 0]
sum weight: 51
value: 5
There is a sequence S.
All the elements in S is product of 2, 3, 5.
S = {2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24 ...}
How to get the 1000th element in this sequence efficiently?
I check each number from 1, but this method is too slow.
A geometric approach:
Let s = 2^i . 3^j . 5^k, where the triple (i, j, k) belongs to the first octant of a 3D state space.
Taking the logarithm,
ln(s) = i.ln(2) + j.ln(3) + k.ln(5)
so that in the state space the iso-s surfaces are planes, which intersect the first octant along a triangle. On the other hand, the feasible solutions are the nodes of a square grid.
If one wants to produce the s-values in increasing order, one can keep a list of the grid nodes closest to the current s-plane*, on its "greater than" side.
If I am right, to move from one s-value to the next, it suffices to discard the current (i, j, k) and replace it by the three triples (i+1, j, k), (i, j+1, k) and (i, j, k+1), unless they are already there, and pick the next smallest s.
An efficient implementation will be by storing the list as a binary tree with the log(s)-value as the key.
If you are asking for the first N values, you will explore a pyramidal volume of state-space of height O(³√N), and base area O(³√N²), which is the number of tree nodes, hence the spatial complexity. Every query in the tree will take O(log(N)) comparisons (and O(1) operations to fetch the minimum), for a total of O(N.log(N)).
*More precisely, the list will contain all triples on the "greater than" side and such that no index can be decreased without getting on the other side of the plane.
Here is Python code that implements these ideas.
You will notice that the logarithms are converted to fixed point (7 decimals) to avoid floating-point inaccuracies that could result in the log(s)-values not being found equal. This causes the s values being inexact in the last digits, but this does not matter as long as the ordering of the values is preserved. Recomputing the s-values from the indexes yields exact values.
import math
import bintrees
# Constants
ln2= round(10000000 * math.log(2))
ln3= round(10000000 * math.log(3))
ln5= round(10000000 * math.log(5))
# Initial list
t= bintrees.FastAVLTree()
t.insert(0, (0, 0, 0))
# Find the N first products
N= 100
for i in range(N):
# Current s
s= t.pop_min()
print math.pow(2, s[1][0]) * math.pow(3, s[1][1]) * math.pow(5, s[1][2])
# Update the list
if not s[0] + ln2 in t:
t.insert(s[0] + ln2, (s[1][0]+1, s[1][1], s[1][2]))
if not s[0] + ln3 in t:
t.insert(s[0] + ln3, (s[1][0], s[1][1]+1, s[1][2]))
if not s[0] + ln5 in t:
t.insert(s[0] + ln5, (s[1][0], s[1][1], s[1][2]+1))
The 100 first values are
1 2 3 4 5 6 8 9 10 12
15 16 18 20 24 25 27 30 32 36
40 45 48 50 54 60 64 72 75 80
81 90 96 100 108 120 125 128 135 144
150 160 162 180 192 200 216 225 240 243
250 256 270 288 300 320 324 360 375 384
400 405 432 450 480 486 500 512 540 576
600 625 640 648 675 720 729 750 768 800
810 864 900 960 972 1000 1024 1080 1125 1152
1200 1215 1250 1280 1296 1350 1440 1458 1500 1536
The plot of the number of tree nodes confirms the O(³√N²) spatial behavior.
Update:
When there is no risk of overflow, a much simpler version (not using logarithms) is possible:
import math
import bintrees
# Initial list
t= bintrees.FastAVLTree()
t[1]= None
# Find the N first products
N= 100
for i in range(N):
# Current s
(s, r)= t.pop_min()
print s
# Update the list
t[2 * s]= None
t[3 * s]= None
t[5 * s]= None
Simply put, you just have to generate each ith number consecutively. Let's call the set {2, 3, 5} to be Z. At ith iteration, assume you have all (i-1) of the values generated in the previous iteration. While generating the next one, what you basically have to do is trying all the elements in Z and for each of them generating **the least element they can form that is larger than the element generated at (i-1)th iteration. Then, you simply consider the smallest one among them as the ith value. A simple and not so efficient implementation is given below.
def generate_simple(N, Z):
generated = [1]
for i in range(1, N+1):
minFound = -1
minElem = -1
for j in range(0, len(Z)):
for k in range(0, len(generated)):
candidateVal = Z[j] * generated[k]
if candidateVal > generated[-1]:
if minFound == -1 or minFound > candidateVal:
minFound = candidateVal
minElem = j
break
generated.append(minFound)
return generated[-1]
As you may observe, this approach has a time complexity of O(N2 * |Z|). An improvement in terms of efficiency would be to store where we left off scanning in the array of generated values for each element in a second array, indicesToStart. Then, for each element we would only scan all N values of the array generated for once(i.e. all through the algorithm), which means the time complexity after such an improvement would be O(N * |Z|).
A simple implementation of the improvement based on the simple version provided above, is given below.
def generate_improved(N, Z):
generated = [1]
indicesToStart = [0] * len(Z)
for i in range(1, N+1):
minFound = -1
minElem = -1
for j in range(0, len(Z)):
for k in range(indicesToStart[j], len(generated)):
candidateVal = Z[j] * generated[k]
if candidateVal > generated[-1]:
if minFound == -1 or minFound > candidateVal:
minFound = candidateVal
minElem = j
break
indicesToStart[j] += 1
generated.append(minFound)
indicesToStart[minElem] += 1
return generated[-1]
If you have a hard time understanding how complexity decreases with this algorithm, try looking into the difference in time complexity of any graph traversal algorithm when an adjacency list is used, and when an adjacency matrix is used. The improvement adjacency lists help achieve is almost exactly the same kind of improvement we get here. In a nutshell, you have an index for each element and instead of starting to scan from the beginning you continue from wherever you left the last time you scanned the generated array for that element. Consequently, even though there are N iterations in the algorithm(i.e. the outermost loop) the overall number of operations you make is O(N * |Z|).
Important Note: All the code above is a simple implementation for demonstration purposes, and you should consider it just as a pseudocode you can test. While implementing this in real life, based on the programming language you choose to use, you will have to consider issues like integer overflow when computing candidateVal.
I tried to use VW to train a regression model on a small set of examples (about 3112). I think I'm doing it correctly, yet it showed me weird results. Dug around but didn't find anything helpful.
$ cat sh600000.feat | vw --l1 1e-8 --l2 1e-8 --readable_model model -b 24 --passes 10 --cache_file cache
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.040000 0.040000 1 1.0 -0.2000 0.0000 79
0.051155 0.062310 2 2.0 0.2000 -0.0496 79
0.046606 0.042056 4 4.0 0.4100 0.1482 79
0.052160 0.057715 8 8.0 0.0200 0.0021 78
0.064936 0.077711 16 16.0 -0.1800 0.0547 77
0.060507 0.056079 32 32.0 0.0000 0.3164 79
0.136933 0.213358 64 64.0 -0.5900 -0.0850 79
0.151692 0.166452 128 128.0 0.0700 0.0060 79
0.133965 0.116238 256 256.0 0.0900 -0.0446 78
0.179995 0.226024 512 512.0 0.3700 -0.0217 79
0.109296 0.038597 1024 1024.0 0.1200 -0.0728 79
0.579360 1.049425 2048 2048.0 -0.3700 -0.0084 79
0.485389 0.485389 4096 4096.0 1.9600 0.3934 79 h
0.517748 0.550036 8192 8192.0 0.0700 0.0334 79 h
finished run
number of examples per pass = 2847
passes used = 5
weighted example sum = 14236
weighted label sum = -155.98
average loss = 0.490685 h
best constant = -0.0109567
total feature number = 1121506
$ wc model
41 48 657 model
Questions:
Why is the number of features in the output (readable) model less than the number of actual features? I counted that the training data contains 78 features (plus the bias that's 79 as shown during the training). The number of feature bits is 24, which should be far more than enough to avoid collision.
Why does the average loss actually go up in the training as you can see in the above example?
(Minor) I tried to increase the number of feature bits to 32, and it output an empty model. Why?
EDIT:
I tried to shuffle the input file, as well as using --holdout_off, as suggested. But the result is still almost the same - the average loss go up.
$ cat sh600000.feat.shuf | vw --l1 1e-8 --l2 1e-8 --readable_model model -b 24 --passes 10 --cache_file cache --holdout_off
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.040000 0.040000 1 1.0 -0.2000 0.0000 79
0.051155 0.062310 2 2.0 0.2000 -0.0496 79
0.046606 0.042056 4 4.0 0.4100 0.1482 79
0.052160 0.057715 8 8.0 0.0200 0.0021 78
0.071332 0.090504 16 16.0 0.0300 0.1203 79
0.043720 0.016108 32 32.0 -0.2200 -0.1971 78
0.142895 0.242071 64 64.0 0.0100 -0.1531 79
0.158564 0.174232 128 128.0 0.0500 -0.0439 79
0.150691 0.142818 256 256.0 0.3200 0.1466 79
0.197050 0.243408 512 512.0 0.2300 -0.0459 79
0.117398 0.037747 1024 1024.0 0.0400 0.0284 79
0.636949 1.156501 2048 2048.0 1.2500 -0.0152 79
0.363364 0.089779 4096 4096.0 0.1800 0.0071 79
0.477569 0.591774 8192 8192.0 -0.4800 0.0065 79
0.411068 0.344567 16384 16384.0 0.0700 0.0450 77
finished run
number of examples per pass = 3112
passes used = 10
weighted example sum = 31120
weighted label sum = -105.5
average loss = 0.423404
best constant = -0.0033901
total feature number = 2451800
The training examples are unique to each other so I doubt there is over-fitting problem (which, as I understand it, usually happens when the number of input is too small comparing the number of features).
EDIT2:
Tried to print the average loss for every pass of examples, and see that it mostly remains constant.
$ cat dist/sh600000.feat | vw --l1 1e-8 --l2 1e-8 -f dist/model -P 3112 --passes 10 -b 24 --cache_file dist/cache
using l1 regularization = 1e-08
using l2 regularization = 1e-08
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
final_regressor = dist/model
using cache_file = dist/cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.498822 0.498822 3112 3112.0 0.0800 0.0015 79 h
0.476677 0.454595 6224 6224.0 -0.2200 -0.0085 79 h
0.466413 0.445856 9336 9336.0 0.0200 -0.0022 79 h
0.490221 0.561506 12448 12448.0 0.0700 -0.1113 79 h
finished run
number of examples per pass = 2847
passes used = 5
weighted example sum = 14236
weighted label sum = -155.98
average loss = 0.490685 h
best constant = -0.0109567
total feature number = 1121506
Also another try without the --l1, --l2 and -b parameters:
$ cat dist/sh600000.feat | vw -f dist/model -P 3112 --passes 10 --cache_file dist/cacheNum weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
final_regressor = dist/model
using cache_file = dist/cache
ignoring text input in favor of cache input
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.520286 0.520286 3112 3112.0 0.0800 -0.0021 79 h
0.488581 0.456967 6224 6224.0 -0.2200 -0.0137 79 h
0.474247 0.445538 9336 9336.0 0.0200 -0.0299 79 h
0.496580 0.563450 12448 12448.0 0.0700 -0.1727 79 h
0.533413 0.680958 15560 15560.0 -0.1700 0.0322 79 h
0.524531 0.480201 18672 18672.0 -0.9800 -0.0573 79 h
finished run
number of examples per pass = 2801
passes used = 7
weighted example sum = 19608
weighted label sum = -212.58
average loss = 0.491739 h
best constant = -0.0108415
total feature number = 1544713
Does that mean it's normal for average loss to go up during one pass, but as long as multiple pass gets the same loss then it's fine?
Model file stores only non-zero weights. So most likely others got nulled especially if you are using --l1
It may be caused by many reasons. Perhaps your dataset isn't shuffled well enough. If you sort your dataset so examples labeled -1 will be in first half and examples labeled 1 will be in second then your model will show very good convergence on first half, but you'll see avg loss bump as it reaches 2nd half. So it may be disbalance in dataset. As for last two losses - these are holdout losses (marked with 'h' at end of line) and may point that model is overfitted. Pls refer to my other answer.
Well, in master branch usage of -b 32 is even currently blocked. You shall use up to -b 31. On practice -b 24-28 is usually enough even for dozens of thousands of features.
I would recommend you to get up-to-date VW version from github
Suppose that you time a program as a function of N and produce
the following table.
N seconds
-------------------
19683 0.00
59049 0.00
177147 0.01
531441 0.08
1594323 0.44
4782969 2.46
14348907 13.58
43046721 74.99
129140163 414.20
387420489 2287.85
Estimate the order of growth of the running time as a function of N.
Assume that the running time obeys a power law T(N) ~ a N^b. For your
answer, enter the constant b. Your answer will be marked as correct
if it is within 1% of the target answer - we recommend using
two digits after the decimal separator, e.g., 2.34.
Can someone explain how to calculate this?
Well, it is a simple mathematical problem.
I : a*387420489^b = 2287.85 -> a = 387420489^b/2287.85
II: a*43046721^b = 74.99 -> a = 43046721^b/74.99
III: (I and II)-> 387420489^b/2287.85 = 43046721^b/74.99 ->
-> http://www.purplemath.com/modules/solvexpo2.htm
Use logarithms to solve.
1.You should calculate the ratio of the growth change from one row to the one next
N seconds
--------------------
14348907 13.58
43046721 74.99
129140163 414.2
387420489 2287.85
2.Calculate the change's ratio for N
43046721 / 14348907 = 3
129140163 / 43046721 = 3
therefore the rate of change for N is 3.
3.Calculate the change's ratio for seconds
74.99 / 13.58 = 5.52
Now let check the ratio between one more pare of rows to be sure
414.2 / 74.99 = 5.52
so the change's ratio for seconds is 5.52
4.Build the following equitation
3^b = 5.52
b = 1.55
Finally we get that the order of growth of the running time is 1.55.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I just finished participating in the 2009 ACM ICPC Programming Conest in the Latinamerican Finals. These questions were for Brazil, Bolivia, Chile, etc.
My team and I could only finish two questions out of the eleven (not bad I think for the first try).
Here's one we could finish. I'm curious to seeing any variations to the code. The question in full: ps: These questions can also be found on the official ICPC website available to everyone.
In the land of ACM ruled a greeat king who became obsessed with order. The kingdom had a rectangular form, and the king divided the territory into a grid of small rectangular counties. Before dying the king distributed the counties among his sons.
The king was unaware of the rivalries between his sons: The first heir hated the second but not the rest, the second hated the third but not the rest, and so on...Finally, the last heir hated the first heir, but not the other heirs.
As soon as the king died, the strange rivaly among the King's sons sparked off a generalized war in the kingdom. Attacks only took place between pairs of adjacent counties (adjacent counties are those that share one vertical or horizontal border). A county X attacked an adjacent county Y whenever X hated Y. The attacked county was always conquered. All attacks where carried out simultanously and a set of simultanous attacks was called a battle. After a certain number of battles, the surviving sons made a truce and never battled again.
For example if the king had three sons, named 0, 1 and 2, the figure below shows what happens in the first battle for a given initial land distribution:
INPUT
The input contains several test cases. The first line of a test case contains four integers, N, R, C and K.
N - The number of heirs (2 <= N <= 100)
R and C - The dimensions of the land. (2 <= R,C <= 100)
K - Number of battles that are going to take place. (1 <= K <= 100)
Heirs are identified by sequential integers starting from zero. Each of the next R lines contains C integers HeirIdentificationNumber (saying what heir owns this land) separated by single spaces. This is to layout the initial land.
The last test case is a line separated by four zeroes separated by single spaces. (To exit the program so to speak)
Output
For each test case your program must print R lines with C integers each, separated by single spaces in the same format as the input, representing the land distribution after all battles.
Sample Input: Sample Output:
3 4 4 3 2 2 2 0
0 1 2 0 2 1 0 1
1 0 2 0 2 2 2 0
0 1 2 0 0 2 0 0
0 1 2 2
Another example:
Sample Input: Sample Output:
4 2 3 4 1 0 3
1 0 3 2 1 2
2 1 2
Perl, 233 char
{$_=<>;($~,$R,$C,$K)=split;if($~){#A=map{$_=<>;split}1..$R;$x=0,
#A=map{$r=0;for$d(-$C,$C,1,-1){$r|=($y=$x+$d)>=0&$y<#A&1==($_-$A[$y])%$~
if($p=(1+$x)%$C)>1||1-$d-2*$p}$x++;($_-$r)%$~}#A
while$K--;print"#a\n"while#a=splice#A,0,$C;redo}}
The map is held in a one-dimensional array. This is less elegant than the two-dimensional solution, but it is also shorter. Contains the idiom #A=map{...}#A where all the fighting goes on inside the braces.
Python (420 characters)
I haven't played with code golf puzzles in a while, so I'm sure I missed a few things:
import sys
H,R,C,B=map(int,raw_input().split())
M=(1,0), (0,1),(-1, 0),(0,-1)
l=[map(int,r.split())for r in sys.stdin]
n=[r[:]for r in l[:]]
def D(r,c):
x=l[r][c]
a=[l[r+mr][c+mc]for mr,mc in M if 0<=r+mr<R and 0<=c+mc<C]
if x==0and H-1in a:n[r][c]=H-1
elif x-1in a:n[r][c]=x-1
else:n[r][c]=x
G=range
for i in G(B):
for r in G(R):
for c in G(C):D(r,c)
l=[r[:] for r in n[:]]
for r in l:print' '.join(map(str,r))
Lua, 291 Characters
g=loadstring("return io.read('*n')")repeat n=g()r=g()c=g()k=g()l={}c=c+1 for
i=0,k do w={}for x=1,r*c do a=l[x]and(l[x]+n-1)%n w[x]=i==0 and x%c~=0 and
g()or(l[x-1]==a or l[x+1]==a or l[x+c]==a or l[x-c]==a)and a or
l[x]io.write(i~=k and""or x%c==0 and"\n"or w[x].." ")end l=w end until n==0
F#, 675 chars
let R()=System.Console.ReadLine().Split([|' '|])|>Array.map int
let B(a:int[][]) r c g=
let n=Array.init r (fun i->Array.copy a.[i])
for i in 1..r-2 do for j in 1..c-2 do
let e=a.[i].[j]-1
let e=if -1=e then g else e
if a.[i-1].[j]=e||a.[i+1].[j]=e||a.[i].[j-1]=e||a.[i].[j+1]=e then
n.[i].[j]<-e
n
let mutable n,r,c,k=0,0,0,0
while(n,r,c,k)<>(0,2,2,0)do
let i=R()
n<-i.[0]
r<-i.[1]+2
c<-i.[2]+2
k<-i.[3]
let mutable a=Array.init r (fun i->
if i=0||i=r-1 then Array.create c -2 else[|yield -2;yield!R();yield -2|])
for j in 1..k do a<-B a r c (n-1)
for i in 1..r-2 do
for j in 1..c-2 do
printf "%d" a.[i].[j]
printfn ""
Make the array big enough to put an extra border of "-2" around the outside - this way can look left/up/right/down without worrying about out-of-bounds exceptions.
B() is the battle function; it clones the array-of-arrays and computes the next layout. For each square, see if up/down/left/right is the guy who hates you (enemy 'e'), if so, he takes you over.
The main while loop just reads input, runs k iterations of battle, and prints output as per the spec.
Input:
3 4 4 3
0 1 2 0
1 0 2 0
0 1 2 0
0 1 2 2
4 2 3 4
1 0 3
2 1 2
0 0 0 0
Output:
2220
2101
2220
0200
103
212
Python 2.6, 383 376 Characters
This code is inspired by Steve Losh' answer:
import sys
A=range
l=lambda:map(int,raw_input().split())
def x(N,R,C,K):
if not N:return
m=[l()for _ in A(R)];n=[r[:]for r in m]
def u(r,c):z=m[r][c];n[r][c]=(z-((z-1)%N in[m[r+s][c+d]for s,d in(-1,0),(1,0),(0,-1),(0,1)if 0<=r+s<R and 0<=c+d<C]))%N
for i in A(K):[u(r,c)for r in A(R)for c in A(C)];m=[r[:]for r in n]
for r in m:print' '.join(map(str,r))
x(*l())
x(*l())
Haskell (GHC 6.8.2), 570 446 415 413 388 Characters
Minimized:
import Monad
import Array
import List
f=map
d=getLine>>=return.f read.words
h m k=k//(f(\(a#(i,j),e)->(a,maybe e id(find(==mod(e-1)m)$f(k!)$filter(inRange$bounds k)[(i-1,j),(i+1,j),(i,j-1),(i,j+1)])))$assocs k)
main=do[n,r,c,k]<-d;when(n>0)$do g<-mapM(const d)[1..r];mapM_(\i->putStrLn$unwords$take c$drop(i*c)$f show$elems$(iterate(h n)$listArray((1,1),(r,c))$concat g)!!k)[0..r-1];main
The code above is based on the (hopefully readable) version below. Perhaps the most significant difference with sth's answer is that this code uses Data.Array.IArray instead of nested lists.
import Control.Monad
import Data.Array.IArray
import Data.List
type Index = (Int, Int)
type Heir = Int
type Kingdom = Array Index Heir
-- Given the dimensions of a kingdom and a county, return its neighbors.
neighbors :: (Index, Index) -> Index -> [Index]
neighbors dim (i, j) =
filter (inRange dim) [(i - 1, j), (i + 1, j), (i, j - 1), (i, j + 1)]
-- Given the first non-Heir and a Kingdom, calculate the next iteration.
iter :: Heir -> Kingdom -> Kingdom
iter m k = k // (
map (\(i, e) -> (i, maybe e id (find (== mod (e - 1) m) $
map (k !) $ neighbors (bounds k) i))) $
assocs k)
-- Read a line integers from stdin.
readLine :: IO [Int]
readLine = getLine >>= return . map read . words
-- Print the given kingdom, assuming the specified number of rows and columns.
printKingdom :: Int -> Int -> Kingdom -> IO ()
printKingdom r c k =
mapM_ (\i -> putStrLn $ unwords $ take c $ drop (i * c) $ map show $ elems k)
[0..r-1]
main :: IO ()
main = do
[n, r, c, k] <- readLine -- read number of heirs, rows, columns and iters
when (n > 0) $ do -- observe that 0 heirs implies [0, 0, 0, 0]
g <- sequence $ replicate r readLine -- read initial state of the kingdom
printKingdom r c $ -- print kingdom after k iterations
(iterate (iter n) $ listArray ((1, 1), (r, c)) $ concat g) !! k
main -- handle next test case
AWK - 245
A bit late, but nonetheless... Data in a 1-D array. Using a 2-D array the solution is about 30 chars longer.
NR<2{N=$1;R=$2;C=$3;K=$4;M=0}NR>1{for(i=0;i++<NF;)X[M++]=$i}END{for(k=0;k++<K;){
for(i=0;i<M;){Y[i++]=X[i-(i%C>0)]-(b=(N-1+X[i])%N)&&X[i+((i+1)%C>0)]-b&&X[i-C]-b
&&[i+C]-b?X[i]:b}for(i in Y)X[i]=Y[i]}for(i=0;i<M;)printf"%s%d",i%C?" ":"\n",
X[i++]}