order of growth in algorithms - algorithm

Suppose that you time a program as a function of N and produce
the following table.
N seconds
-------------------
19683 0.00
59049 0.00
177147 0.01
531441 0.08
1594323 0.44
4782969 2.46
14348907 13.58
43046721 74.99
129140163 414.20
387420489 2287.85
Estimate the order of growth of the running time as a function of N.
Assume that the running time obeys a power law T(N) ~ a N^b. For your
answer, enter the constant b. Your answer will be marked as correct
if it is within 1% of the target answer - we recommend using
two digits after the decimal separator, e.g., 2.34.
Can someone explain how to calculate this?

Well, it is a simple mathematical problem.
I : a*387420489^b = 2287.85 -> a = 387420489^b/2287.85
II: a*43046721^b = 74.99 -> a = 43046721^b/74.99
III: (I and II)-> 387420489^b/2287.85 = 43046721^b/74.99 ->
-> http://www.purplemath.com/modules/solvexpo2.htm
Use logarithms to solve.

1.You should calculate the ratio of the growth change from one row to the one next
N seconds
--------------------
14348907 13.58
43046721 74.99
129140163 414.2
387420489 2287.85
2.Calculate the change's ratio for N
43046721 / 14348907 = 3
129140163 / 43046721 = 3
therefore the rate of change for N is 3.
3.Calculate the change's ratio for seconds
74.99 / 13.58 = 5.52
Now let check the ratio between one more pare of rows to be sure
414.2 / 74.99 = 5.52
so the change's ratio for seconds is 5.52
4.Build the following equitation
3^b = 5.52
b = 1.55
Finally we get that the order of growth of the running time is 1.55.

Related

Why does coxph() combined with cluster() give much smaller standard errors than other methods to adjust for clustering (e.g. coxme() or frailty()?

I am working on a dataset to test the association between empirical antibiotics (variable emp, the antibiotics are cefuroxime or ceftriaxone compared with a reference antibiotic) and 30-day mortality (variable mort30). The data comes from patients admitted in 6 hospitals (variable site2) with a specific type of infection. Therefore, I would like to adjust for this clustering of patients on hospital level.
First I did this using the coxme() function for mixed models. However, based on visual inspection of the Schoenfeld residuals there were violations of the proportional hazards assumption and I tried adding a time transformation (tt) to the model. Unfortunately, the coxme() does not offer the possibility for time transformations.
Therfore, I tried other options to adjust for the clustering, including coxph() combined with frailty() and cluster. Surprisingly, the standard errors I get using the cluster() option are much smaller than using the coxme() or frailty().
**Does anyone know what is the explanation for this and which option would provide the most reliable estimates?
**
1) Using coxme:
> uni.mort <- coxme(Surv(FUdur30, mort30num) ~ emp + (1 | site2), data = total.pop)
> summary(uni.mort)
Cox mixed-effects model fit by maximum likelihood
Data: total.pop
events, n = 58, 253
Iterations= 24 147
NULL Integrated Fitted
Log-likelihood -313.8427 -307.6543 -305.8967
Chisq df p AIC BIC
Integrated loglik 12.38 3.00 0.0061976 6.38 0.20
Penalized loglik 15.89 3.56 0.0021127 8.77 1.43
Model: Surv(FUdur30, mort30num) ~ emp + (1 | site2)
Fixed coefficients
coef exp(coef) se(coef) z p
empCefuroxime 0.5879058 1.800214 0.6070631 0.97 0.33
empCeftriaxone 1.3422317 3.827576 0.5231278 2.57 0.01
Random effects
Group Variable Std Dev Variance
site2 Intercept 0.2194737 0.0481687
> confint(uni.mort)
2.5 % 97.5 %
empCefuroxime -0.6019160 1.777728
empCeftriaxone 0.3169202 2.367543
2) Using frailty()
uni.mort <- coxph(Surv(FUdur30, mort30num) ~ emp + frailty(site2), data = total.pop)
> summary(uni.mort)
Call:
coxph(formula = Surv(FUdur30, mort30num) ~ emp + frailty(site2),
data = total.pop)
n= 253, number of events= 58
coef se(coef) se2 Chisq DF p
empCefuroxime 0.6302 0.6023 0.6010 1.09 1.0 0.3000
empCeftriaxone 1.3559 0.5221 0.5219 6.75 1.0 0.0094
frailty(site2) 0.40 0.3 0.2900
exp(coef) exp(-coef) lower .95 upper .95
empCefuroxime 1.878 0.5325 0.5768 6.114
empCeftriaxone 3.880 0.2577 1.3947 10.796
Iterations: 7 outer, 27 Newton-Raphson
Variance of random effect= 0.006858179 I-likelihood = -307.8
Degrees of freedom for terms= 2.0 0.3
Concordance= 0.655 (se = 0.035 )
Likelihood ratio test= 12.87 on 2.29 df, p=0.002
3) Using cluster()
uni.mort <- coxph(Surv(FUdur30, mort30num) ~ emp, cluster = site2, data = total.pop)
> summary(uni.mort)
Call:
coxph(formula = Surv(FUdur30, mort30num) ~ emp, data = total.pop,
cluster = site2)
n= 253, number of events= 58
coef exp(coef) se(coef) robust se z Pr(>|z|)
empCefuroxime 0.6405 1.8975 0.6009 0.3041 2.106 0.035209 *
empCeftriaxone 1.3594 3.8937 0.5218 0.3545 3.834 0.000126 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
empCefuroxime 1.897 0.5270 1.045 3.444
empCeftriaxone 3.894 0.2568 1.944 7.801
Concordance= 0.608 (se = 0.027 )
Likelihood ratio test= 12.08 on 2 df, p=0.002
Wald test = 15.38 on 2 df, p=5e-04
Score (logrank) test = 10.69 on 2 df, p=0.005, Robust = 5.99 p=0.05
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
>

Sum of the following ncr series in O(1) time complexity

There are multiple queries of the form
Q(n,m) = (nC1*mC1) + (nC2*mC2) + (nC3*mC3) ... (nCk*mCk) where
k=min(n,m)
How to find the value of Q(n,m) in O(1) time complexity.
I tried pre-computing ncr[N][N] matrix and dp[N][N][N] where dp[n][m][min(n,m)] = Q(n,m).
This pre-computation takes O(N^3) time and queries can be answered in O(1) time now. But I'm looking for an approach in which pre-computation shouldn't take more O(N^2) time.
Solution for starting from C(n,0)*C(m,0) seems pretty simple
Q0(n,m) = C(n+m, m)
So for your formulation just subtract 1
Q(n,m) = C(n+m, m) - 1
Example: n=9, m=5
Dot product of 9-th and 5-th rows of Pascal's triangle is
1 9 36 84 126 126 84 36 9 1
1 5 10 10 5 1
1 + 45 + 360 + 840 + 630 + 126 = 2002 = C(14,5)
It might be proved with math induction starting from Q(n,1) but expressions are rather long.
I have discovered a truly marvelous demonstration of this proposition that this margin is too narrow to contain © Fermat ;)

Iterating to figure out combinations of variable number of elements

I'm making a program that uses dynamic programming to decide how to distribute some files (Movies) among DVDs so that it uses the least number of DVDs.
After much thought I decided that a good way to do it is to look at every possible combination of movies that is less than 4.38 GB (The actual size of a DVD ) and pick the largest one (i.e. the one that wastes the least space) and remove the movies in the most efficient combination and repeat until it run out of movies.
The problem is that I don't know how to loop so I can figure out every possible combination, given that movies vary in size, so a specific number of nested loops cannot be used.
pseudo-code:
Some kind of loop:
best_combination=[]
best_combination_size=0
if current_combination<4.38 and current_combination>best_combination_size:
best_combination=current_combination
best_combination_size=best_combination_size
print(best_combination)
delete best_combination from list_of_movies
first time to post a question..so go easy on me guys!!
Thanks in advance
P.S. I figured out a way to do it using Dijkstra's which I think would be fast but not memory friendly. If anybody is interested i would gladly discuss it.
You should really stick to common bin-packing heuristics. The wikipedia article gives a good overview of approaches including links to problem-tailored exact approaches. But always keep in mind: it's an np-complete problem!
I will show you some example supporting my hint, that you should stick to heuristics.
The following python-code:
creates parameterized random-problems (normal-distribution on multiple means/stds; acceptance-sampling to make sure that no movie is bigger than a DVD)
uses some random binpacking-library which implements some greedy-heuristic (i didn't try or test this lib before; so no guarantees!; no idea which heuristic is used)
uses a naive mixed-integer programming implementation (which is solved by a commercial solver; open-source solvers like cbc struggle, but might be used for good approximate solutions)
Code
import numpy as np
from cvxpy import *
from time import time
""" Generate some test-data """
np.random.seed(1)
N = 150 # movies
means = [700, 1400, 4300]
stds = [100, 300, 500]
DVD_SIZE = 4400
movies = []
for movie in range(N):
while True:
random_mean_index = np.random.randint(low=0, high=len(means))
random_size = np.random.randn() * stds[random_mean_index] + means[random_mean_index]
if random_size <= DVD_SIZE:
movies.append(random_size)
break
""" HEURISTIC SOLUTION """
import binpacking
start = time()
bins = binpacking.to_constant_volume(movies, DVD_SIZE)
end = time()
print('Heuristic solution: ')
for b in bins:
print(b)
print('used bins: ', len(bins))
print('used time (seconds): ', end-start)
""" Preprocessing """
movie_sizes_sorted = sorted(movies)
max_movies_per_dvd = 0
occupied = 0
for i in range(N):
if occupied + movie_sizes_sorted[i] <= DVD_SIZE:
max_movies_per_dvd += 1
occupied += movie_sizes_sorted[i]
else:
break
""" Solve problem """
# Variables
X = Bool(N, N) # N * number-DVDS
I = Bool(N) # indicator: DVD used
# Constraints
constraints = []
# (1) DVDs not overfilled
for dvd in range(N):
constraints.append(sum_entries(mul_elemwise(movies, X[:, dvd])) <= DVD_SIZE)
# (2) All movies distributed exactly once
for movie in range(N):
constraints.append(sum_entries(X[movie, :]) == 1)
# (3) Indicators
for dvd in range(N):
constraints.append(sum_entries(X[:, dvd]) <= I[dvd] * (max_movies_per_dvd + 1))
# Objective
objective = Minimize(sum_entries(I))
# Problem
problem = Problem(objective, constraints)
start = time()
problem.solve(solver=GUROBI, MIPFocus=1, verbose=True)
#problem.solve(solver=CBC, CliqueCuts=True)#, GomoryCuts=True, KnapsackCuts=True, verbose=True)#, GomoryCuts=True, MIRCuts=True, ProbingCuts=True,
#CliqueCuts=True, FlowCoverCuts=True, LiftProjectCuts=True,
#verbose=True)
end = time()
""" Print solution """
for dvd in range(N):
movies_ = []
for movie in range(N):
if np.isclose(X.value[movie, dvd], 1):
movies_.append(movies[movie])
if movies_:
print('DVD')
for movie in movies_:
print(' movie with size: ', movie)
print('Distributed ', N, ' movies to ', int(objective.value), ' dvds')
print('Optimizatio took (seconds): ', end-start)
Partial Output
Heuristic solution:
-------------------
('used bins: ', 60)
('used time (seconds): ', 0.0045168399810791016)
MIP-approach:
-------------
Root relaxation: objective 2.142857e+01, 1921 iterations, 0.10 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 21.42857 0 120 106.00000 21.42857 79.8% - 0s
H 0 0 68.0000000 21.42857 68.5% - 0s
H 0 0 63.0000000 21.42857 66.0% - 0s
0 0 21.42857 0 250 63.00000 21.42857 66.0% - 1s
H 0 0 62.0000000 21.42857 65.4% - 2s
0 0 21.42857 0 256 62.00000 21.42857 65.4% - 2s
0 0 21.42857 0 304 62.00000 21.42857 65.4% - 2s
0 0 21.42857 0 109 62.00000 21.42857 65.4% - 3s
0 2 21.42857 0 108 62.00000 21.42857 65.4% - 4s
40 2 27.61568 20 93 62.00000 27.61568 55.5% 110 5s
H 156 10 61.0000000 58.00000 4.92% 55.3 8s
262 4 59.00000 84 61 61.00000 59.00000 3.28% 44.2 10s
413 81 infeasible 110 61.00000 59.00000 3.28% 37.2 15s
H 417 78 60.0000000 59.00000 1.67% 36.9 15s
1834 1212 59.00000 232 40 60.00000 59.00000 1.67% 25.7 20s
...
...
57011 44660 infeasible 520 60.00000 59.00000 1.67% 27.1 456s
57337 44972 59.00000 527 34 60.00000 59.00000 1.67% 27.1 460s
58445 45817 59.00000 80 94 60.00000 59.00000 1.67% 26.9 466s
59387 46592 59.00000 340 65 60.00000 59.00000 1.67% 26.8 472s
Analysis
Some observations regarding the example above:
the heuristic obtains some solution of value 60 instantly
the commercial-solver needs more time but also find a solution of value 60 (15 secs)
also tries to find a better solution or proof that there is no one (MIP-solvers are complete = find optimal solution or proof there is none given infinite time!)
no progress for some time!
but: we got proof, that there is at best a solution of size 59
= maybe you will save one DVD by solving the problem optimally; but it's hard to find a solution and we don't know if this solution exists (yet)!
Remarks
The observations above are heavily dependent on data-statistics
It's easy to try other problems (maybe smaller) where the commercial MIP-solver finds a solution which uses 1 less DVD (e.g. 49 vs. 50)
It's not worth it (remember: open-source solvers are struggling even more)
The formulation is very simple and not tuned at all (don't blame only the solvers)
There are exact algorithms (which might be much more complex to implement) which could be appropriate
Conclusion
Heuristics are very easy to implement and provide very good solutions in general. Most of these also come with very good theoretical guarantees (e.g. at most 11/9 opt + 1 #DVDs compared to optimal solution are used = first fit decreasing heuristic). Despite the fact, that i'm keen on optimization in general, i probably would use the heuristics-approach here.
The general problem is also very popular, so that there should exist some good library for this problem in many programming-languages!
Without claiming, that the solution this answer presents is optimized, optimal or possesses any other remarkable qualities, here a greedy approach to the dvd packing problem.
import System.Random
import Data.List
import Data.Ord
-- F# programmers are so used to this operator, I just cannot live without it ...yet.
(|>) a b = b a
data Dvd = Dvd { duration :: Int, movies :: [Int] } deriving (Show,Eq)
dvdCapacity = 1000 :: Int -- a dvd has capacity for 1000 time units - arbitrary unit
-- the duration of a movie is between 1 and 1000 time units
r = randomRs (1,1000) (mkStdGen 42) :: [Int]
-- our test data set of 1000 movies, random movie durations drawn from r
allMovies = zip [1..1000] (take 1000 r)
allMoviesSorted = reverse $ sortBy (comparing snd) allMovies
remainingCapacity dvd = dvdCapacity - duration dvd
emptyDvd = Dvd { duration = 0, movies = [] }
-- from the remaining movies, pick the largest one with at most maxDuration length.
pickLargest remaining maxDuration =
let (left,right) = remaining |> break (\ (a,b) -> b <= maxDuration)
(h,t) = case right of
[] -> (Nothing,[])
otherwise -> (Just (head right), right |> tail)
in
(h,[left, t] |> concat)
-- add a track (movie) to a dvd
addTrack dvd track =
Dvd {duration = (duration dvd) + snd track,
movies = fst track : (movies dvd) }
-- pick dvd from dvds with largest remaining capacity
-- and add the largest remaining fitting track
greedyPack movies dvds
| movies == [] = (dvds,[])
| otherwise =
let dvds' = reverse $ sortBy (comparing remainingCapacity) dvds
(picked,movies') =
case dvds' of
[] -> (Nothing, movies)
(x:xs) -> pickLargest movies (remainingCapacity x)
in
case picked of
Nothing ->
-- None of the current dvds had enough capacity remaining
-- tp pick another movie and add it. -> Add new empty dvd
-- and run greedyPack again.
greedyPack movies' (emptyDvd : dvds')
Just p ->
-- The best fitting movie could be added to the
-- dvd with the largest remaining capacity.
greedyPack movies' (addTrack (head dvds') p : tail dvds')
(result,residual) = greedyPack allMoviesSorted [emptyDvd]
usedDvdsCount = length result
totalPlayTime = allMovies |> foldl (\ s (i,d) -> s + d) 0
optimalDvdsCount = round $ 0.5 + fromIntegral totalPlayTime / fromIntegral dvdCapacity
solutionQuality = length result - optimalDvdsCount
Compared to the theoretical optimal dvd count it wastes 4 extra dvds on the given data set.

How to calculate execution time (speedup)

I was stuck when trying to calculate for the speedup. So the question given was:
Question 1
If 50% of a program is enhanced by 2 times and the rest 50% is enhanced by 4 times then what is the overall speedup due to the enhancements? Hints: Consider that the execution time of program in the machine before enhancement (without enhancement) is T. Then find the total execution time after the enhancements, T'. The speedup is T/T'.
The only thing I know is speedup = execution time before enhancement/execution time after enhancement. So can I assume the answer is:
Speedup = T/((50/100x1/2) + (50/100x1/4))
Total execution time after the enhancement = T + speedup
(50/100x1/2) because 50% was enhanced by 2 times and same goes to the 4 times.
Question 2
Let us hypothetically imagine that the execution of (2/3)rd of a program could be made to run infinitely fast by some kind of improvement/enhancement in the design of a processor. Then how many times the enhanced processor will run faster compared with the un-enhanced (original) machine?
Can I assume that it is 150 times faster as 100/(2/3) = 150
Any ideas? Thanks in advance.
Let's start with question 1.
The total time is the sum of the times for the two halves:
T = T1 + T2
Then, T1 is enhanced by a factor of two. T2 is improved by a factor of 4:
T' = T1' + T2'
= T1 / 2 + T2 / 4
We know that both T1 and T2 are 50% of T. So:
T' = 0.5 * T / 2 + 0.5 * T / 4
= 1/4 * T + 1/8 * T
= 3/8 * T
The speed-up is
T / T' = T / (3/8 T) = 8/3
Question two can be solved similarly:
T' = T1' + T2'
T1' is reduced to 0. T2 is the remaining 1/3 of T.
T' = 1/3 T
The speed-up is
T / T' = 3
Hence, the program is three times as fast as before (or two times faster).

Math algorithm question

I'm not sure if this can be done without some determining factor....but wanted to see if someone knew of a way to do this.
I want to create a shifting scale for numbers.
Let's say I have the number 26000. I want the outcome of this algorithm to be 6500; or 25% of the original number. But if I have the number 5000, I want the outcome to be 2500; or 50% of the original number.
The percentages don't have to be exact, this is just an example.
I just want to have like a sine wave sort of thing. As the input number gets higher, the output number is a lower percentage of the input.
Does that make sense?
Plot some points in Excel and use the "show formula" option on the line.
Something like f(x) = x / log x?
x | f(x)
=======================
26000 | 5889 (22.6 %)
5000 | 1351 (27.2 %)
100000 | 20000 (20 %)
1000000 | 166666 (16.6 %)
Just a simple example. You can tweak it by playing with the base of the logarithm, by adding multiplicative constants on the numerator (x) or denominator (log x), by using square roots, squaring (or taking the root of) log x or x etc.
Here's what f(x) = 2*log(x)^2*sqrt(x) gives:
x | f(x)
=======================
26000 | 6285 (24 %)
5000 | 1934 (38 %)
500 | 325 (65 %)
100 | 80 (80 %)
1000000 | 72000 (7.2 %)
100000 | 15811 (15 %)
A suitable logarithmic scale might help.
It may be possible to define the function you want exactly if you specify a third transformation in addition to the two you've already mentioned. If you have some specific aim in mind, it's quite likely to fit a well-known mathematical definition which at least one poster could identify for you. It does sound as though you're talking about a logarithmic function. You'll have to be more specific about your requirements to define a useful algorithm however.
I'd suggest you play with the power law family of functions, c*x^a == c * pow(x,a) where a is the power. If you want an exact fraction of your answer, you would choose a=1 and it would just be a constant fraction. But you want the percentage to slowly decrease, so you could choose a<1. For example, we might choose a = 0.9 and c = 0.2 and get
1 0.2
10 1.59
100 12.6
1000 100.2
So it ranges from 20% at 1 to 10% at 1000. You can pick smaller a to make the fraction decrease more rapidly. (And you can scale everything to fit your range.)
In particular, if c*5000^a = 2500 and c*26000^a = 6500, then by dividing we get (5.1)^a = 2.6 which we can solve as a = log(2.6)/log(5.1) = 0.58648.... Then we plug back in to get c*147.69 = 2500 so c = 16.927...
Now the progression goes like so
1000 973
3000 1853
5000 2500
10000 3754
15000 4762
26000 6574
50000 9648
90000 13618
This is somewhat similar to simple compression schemes used for analogue audio. See Wikipedia entry for Companding.

Resources