Program Runtime HW Problem

Program Runtime HW Problem - big-o

An algo takes .5 ms seconds for an
input size of 100, how long will it
take to run if the input size is 500
and the program is O(n lg(n))?
My book says that when the input size doubles, n lg(n), takes "slightly more than twice as long". That isn't really helping me much.
The way I've been doing it, is solving for the constant multiplier (which the book doesn't talk about, so I don't know if it's valid):
.5ms = c * 100 * lg(100) => c = .000753
So
.000753 * 500 * lg(500) = 3.37562ms
Is that a valid way to calculate running times, and is there a better way to figure it out?

Yes. That is exactly how it works.
Of course this ignores any possible initialization overhead, as this is not specified in big-o notation, but that's irrelevant for most algorithms.

Thats not exactly right. Tomas was right in saying there is overhead and the real equation is more like
runtime = inputSize * lg(inputSize) * singleInputProcessTime + overhead
The singleInputProcessTime has to do with machine operations like loading of address spaces, arithmetic or anything that must be done each time you interact with the input. This generally has a runtime that ranges from a few CPU cycles to seconds or minutes depending on your domain. It is important to understand that this time is roughly CONSTANT and therefore does not affect the overall runtime very much AT LARGE ENOUGH input sizes.
The overhead is cost of setting up the problem/solution such as reading the algorithm into memory, spreading the input amongst servers/processes or any operation which only needs to happen once or a set number of times which does NOT depend on the input size. This cost is also constant and can range anywhere from a few CPU cycles to minutes depending on the method used to solve the problem.
The inputSize and n * lg(n) you know about already, but as for your homework problem, as long as you explain how you got to the solution, you should be just fine.

Related

Time Complexity single loop vs multiple sequential loops

Today, me and my colleague had a small argument about one particular code snippet. The code looks something like this. At least, this is what he imagined it to be.
for(int i = 0; i < n; i++) {
// Some operations here
}
for (int i = 0; i < m; i++) { // m is always small
// Some more operations here
}
He wanted me to remove the second loop, since it would cause performance issues.
However, I was sure that since I don't have any nested loops here, the complexity will always be O(n), no matter how many sequential loops I put (only 2 we had).
His argument was that if n is 1,000,000 and the loop takes 5 seconds, my code will take 10 seconds, since it has 2 for loops. I was confused after this statement.
What I remember from my DSA lessons is that we ignore such constants while calculating Big Oh.
What am I missing here?

Yes,
the complexity theory may help to compare two distinct methods of calculation in [?TIME][?SPACE],
but
Do not use [PTIME] complexity as an argument for a poor efficiency
Fact #1: O( f(N) ) is relevant for comparing complexities, in areas near N ~ INFTY, so the process principal limits are being possible to be compared "there"
Fact #2: Given N ~ { 10k | 10M | 10G }, none of such cases meets the above cited condition
Fact #3: If the process ( algorithm ) allows the loops to get merged without any side-effects ( on resources / blocking / etc ) into a single pass, the single-loop processing may always benefit from the reduced looping overheads.
A micro benchmark will decide, not the O( f( N ) ) for N ~ INFTY
as many additional effects get stronger influence - better or poor cache-line alignment and the amount of possible L1/L2/L3-cache re-uses, smart harnessing of more / less CPU-registers - all of which is driven by possible compiler-optimisations and may further increase code-execution speeds for small N-s, beyond any expectations from above.
So,
do perform several scaling-dependent microbenchmarking, before resorting to argue about limits of O( f( N ) )
Always do.

In asymptotic notation, your code has time complexity O(n + n) = O(2n) =
O(n)
Side note:
If the first loop takes n iterations and the second loop m, then the time complexity would be O(n + m).
PS: I assume that the bodies of your for loops is not heavy enough to affect the overall complexity, as you mentioned too.

You may be confusing time complexity and performance. These are two different (but related) things.
Time complexity deals with comparing the rate of growth of algorithms and ignores constant factors and messy real-world conditions. These simplifications make it a valuable theoretical framework for reasoning about algorithm scalability.
Performance is how fast code runs on an actual computer. Unlike in Big O-land, constant factors exist and often play a dominant role in determining execution time. Your coworker is reasonable to acknowledge this. It's easy to forget that O(1000000n) is the same as O(n) in Big O-land, but to an actual computer, the constant factor is a very real thing.
The bird's-eye view that Big O provides is still valuable; it can help determine if your coworker is getting lost in the details and pursuing a micro-optimization.
Furthermore, your coworker considers simple instruction counting as a step towards comparing actual performance of these loop arrangements, but this is still a major simplification. Consider cache characteristics; out-of-order execution potential; friendliness to prefetching, loop unrolling, vectorization, branch prediction, register allocation and other compiler optimizations; garbage collection/allocation overhead and heap vs stack memory accesses as just a few of the factors that can make enormous differences in execution time beyond including simple operations in the analysis.
For example, if your code is something like
for (int i = 0; i < n; i++) {
foo(arr[i]);
}
for (int i = 0; i < m; i++) {
bar(arr[i]);
}
and n is large enough that arr doesn't fit neatly in the cache (perhaps elements of arr are themselves large, heap-allocated objects), you may find that the second loop has a dramatically harmful effect due to having to bring evicted blocks back into the cache all over again. Rewriting it as
for (int i = 0, end = max(n, m); i < end; i++) {
if (i < n) {
foo(arr[i]);
}
if (i < m) {
bar(arr[i]);
}
}
may have a disproportionate efficiency increase because blocks from arr are brought into the cache once. The if statements might seem to add overhead, but branch prediction may make the impact negligible, avoiding pipeline flushes.
Conversely, if arr fits in the cache, the second loop's performance impact may be negligible (particularly if m is bounded and, better still, small).
Yet again, what is happening in foo and bar could be a critical factor. There simply isn't enough information here to tell which is likely to run faster by looking at these snippets, simple as they are, and the same applies to the snippets in the question.
In some cases, the compiler may have enough information to generate the same code for both of these examples.
Ultimately, the only hope to settle debates like this is to write an accurate benchmark (not necessarily an easy task) that measures the code under its normal working conditions (not always possible) and evaluate the outcome against other constraints and metrics you may have for the app (time, budget, maintainability, customer needs, energy efficiency, etc...).
If the app meets its goals or business needs either way it may be premature to debate performance. Profiling is a great way to determine if the code under discussion is even a problem. See Eric Lippert's Which is Faster? which makes a strong case for (usually) not worrying about these sort of things.
This is a benefit of Big O--if two pieces of code only differ by a small constant factor, there's a decent chance it's not worth worrying about until it proves to be worth attention through profiling.

estimate performance gain based on application profiling (math)

this question is rather "math" related - but certainly is of interest to any "software developer".
i have done some profiling of my application. and i have observed there is a huge performance difference that is "environment specific".
there is a "fast" environment and a "slow" environment.
the overall application performance on "fast" is 5 times faster than on "slow".
a particular function call on "fast" is 18 times faster than on "slow".
so let's assume i will be able to reduce invoking this particular function by 50 percent.
how do i calculate the estimated performance improvement on the "slow" environment?
is there any approximate formula for calculating the expected performance gain?
apologies:
nowadays i'm no longer good at doing any math. or rather i never was!
i have been thinking about where to ask such question, best.
didn't come up with any more suitable place.
also, i wasn't able to come up with an optimal question's subject line and also what tags to assign ...

Let's make an assumption (questionable but we have nothing else to go on).
Let's assume all of the 5:1 reduction in time is due to function foo reducing by 18:1.
That means everything else in the program takes the same amount of time.
So suppose in the fast environment the total time is f + x, where f is the time that foo takes in the fast environment, and x is everything else.
In the slow environment, the time is 18f+x, which equals 5(f+x).
OK, solve for x.
18f+x = 5f+5x
13f = 4x
x = 13/4 f
OK, now on the slow environment you want to call foo half as much.
So then the time would be 9f+x, which is:
9f + 13/4 f = 49/4 f
The original time was 18f+x = (18+13/4)f = 85/4 f
So the time goes from 85/4 f to 49/4 f.
That's a speed ratio of 85/49 = 1.73
In other words, that's a speedup of 73%.

Time taken in executing % / * + - operations

Recently, i heard that % operator is costly in terms of time.
So, the question is that, is there a way to find the remainder faster?
Also your help will be appreciated if anyone can tell the difference in the execution of % / * + - operations.

In some cases where you're using power-of-2 divisors you can do better with roll-your-own techniques for calculating remainder, but generally a halfway decent compiler will do the best job possible with variable divisors, or "odd" divisors that don't fit any pattern.
Note that a few CPUs don't even have a multiply operation, and so (on those) multiply is quite slow vs add (at least 64x for a 32-bit multiply). (But a smart compiler may improve on this if the multiplier is a literal.) A slightly larger number do not have a divide operation or have a pretty slow one. (On a CPU with a fast multiplier multiply may only be on the order of 4 times slower than add, but on "normal" hardware it's 16-32 times slower for a 32 bit operation. Divide is inherently 2-4x slower than multiply, but can be much slower on some hardware.)
The remainder operation is rarely implemented in hardware, and normally A % B maps to something along the lines of A - ((A / B) * B) (a few extra operations may be required to assure the proper sign, et al).
(I learned about this stuff while microprogramming the instruction set for the SUMC computer for RCA/NASA back in the early 70s.)

No, the compiler is going to implement % in the most efficient way possible.
In terms of speed, + and - are the fastest (and are equally fast, generally done by the same hardware).
*, /, and % are much slower. Multiplication is basically done by the method you learn in grade school- multiply the first number by every digit in the second number and add the results. With some hacks made possible by binary. As of a few years ago, multiply was 3x slower than add. Division should be similar to multiply. Remainder is similar to division (in fact it generally calculates both at once).
Exact differences depend on the CPU type and exact model. You'd need to look up the latencies in the CPU spec sheets for your particular machine.

Finding the optimum file size combination

This is a problem I would think there is an algorithm for already - but I do not know the right words to use with google it seems :).
The problem: I would like to make a little program with which I would select a directory containing any files (but for my purpose media files, audio and video). After that I would like to enter in MB the maximum total file size sum that must not be exceeded. At this point you would hit a "Calculate best fit" button.
This button should compare all the files in the directory and provide as a result a list of the files that when put together gets most close to the max total file size without going over the limit.
This way you could find out which files to combine when burning a CD or DVD so that you will be able to use as much as possible of the disc.
I've tried to come up with the algorithm for this myself - but failed :(.
Anyone know of some nice algorithm for doing this?
Thanks in advance :)

Just for fun I tried out the accurate dynamic programming solution. Written in Python, because of my supreme confidence that you shouldn't optimise until you have to ;-)
This could provide either a start, or else a rough idea of how close you can get before resorting to approximation.
Code based on http://en.wikipedia.org/wiki/Knapsack_problem#0-1_knapsack_problem, hence the less-than-informative variable names m, W, w, v.
#!/usr/bin/python
import sys
solcount = 0
class Solution(object):
def __init__(self, items):
object.__init__(self)
#self.items = items
self.value = sum(items)
global solcount
solcount += 1
def __str__(self):
#return str(self.items) + ' = ' + str(self.value)
return ' = ' + str(self.value)
m = {}
def compute(v, w):
coord = (len(v),w)
if coord in m:
return m[coord]
if len(v) == 0 or w == 0:
m[coord] = Solution([])
return m[coord]
newvalue = v[0]
newarray = v[1:]
notused = compute(newarray, w)
if newvalue > w:
m[coord] = notused
return notused
# used = Solution(compute(newarray, w - newvalue).items + [newvalue])
used = Solution([compute(newarray, w - newvalue).value] + [newvalue])
best = notused if notused.value >= used.value else used
m[coord] = best
return best
def main():
v = [int(l) for l in open('filesizes.txt')]
W = int(sys.argv[1])
print len(v), "items, limit is", W
print compute(v, W)
print solcount, "solutions computed"
if __name__ == '__main__':
main()
For simplicity I'm just considering the file sizes: once you have the list of sizes that you want to use, you can find some filenames with those sizes by searching through a list, so there's no point tangling up filenames in the core, slow part of the program. I'm also expressing everything in multiples of the block size.
As you can see, I've commented out the code that gives the actual solution (as opposed to the value of the solution). That was to save memory - the proper way to store the list of files used isn't one list in each Solution, it's to have each solution point back to the Solution it was derived from. You can then calculate the list of filesizes at the end by going back through the chain, outputting the difference between the values at each step.
With a list of 100 randomly-generated file sizes in the range 2000-6000 (I'm assuming 2k blocks, so that's files of size 4-12MB), this solves for W=40K in 100 seconds on my laptop. In doing so it computes 2.6M of a possible 4M solutions.
Complexity is O(W*n), where n is the number of files. This does not contradict the fact that the problem is NP-complete. So I am at least approaching a solution, and this is just in unoptimised Python.
Clearly some optimisation is now required, because actually it needs to be solved for W=4M (8GB DVD) and however many files you have (lets say a few thousand). Presuming that the program is allowed to take 15 minutes (comparable to the time required to write a DVD), that means performance is currently short by a factor of roughly 10^3. So we have a problem that's quite hard to solve quickly and accurately on a PC, but not beyond the bounds of technology.
Memory use is the main concern, since once we start hitting swap we'll slow down, and if we run out of virtual address space we're in real trouble because we have to implement our own storage of Solutions on disk. My test run peaks at 600MB. If you wrote the code in C on a 32-bit machine, each "solution" has a fixed size of 8 bytes. You could therefore generate a massive 2-D array of them without doing any memory allocation in the loop, but in 2GB of RAM you could only handle W=4M and n=67. Oops - DVDs are out. It could very nearly solve for 2-k blocksize CDs, though: W=350k gives n=766.
Edit: MAK's suggestion to compute iteratively bottom-up, rather than recursively top-down, should massively reduce the memory requirement. First calculate m(1,w) for all 0 <= w <= W. From this array, you can calculate m(2,w) for all 0 <= w <= W. Then you can throw away all the m(1,w) values: you won't need them to calculate m(3,w) etc.
By the way, I suspect that actually the problem you want to solve might be the bin packing problem, rather than just the question of how to get the closest possible to filling a DVD. That's if you have a bunch of files, you want to write them all to DVD, using as few DVDs as possible. There are situations where solving the bin packing problem is very easy, but solving this problem is hard. For example, suppose that you have 8GB disks, and 15GB of small files. It's going to take some searching to find the closest possible match to 8GB, but the bin-packing problem would be trivially solved just by putting roughly half the files on each disk - it doesn't matter exactly how you divide them because you're going to waste 1GB of space whatever you do.
All that said, there are extremely fast heuristics that give decent results much of the time. Simplest is to go through the list of files (perhaps in decreasing order of size), and include each file if it fits, exclude it otherwise. You only need to fall back to anything slow if fast approximate solutions aren't "good enough", for your choice of "enough".

This is, as other pointed out, the Knapsack Problem, which is a combinatorial optimization problem. It means that you look for some subset or permutation of a set which minimizes (or maximizes) a certain cost. Another well known such problem is the Traveling Salesman Problem.
Such problems are usually very hard to solve. But if you're interested in almost optimal solutions, you can use non-deterministic algorithms, like simulated annealing. You most likely won't get the optimal solution, but a nearly optimal one.
This link explains how simulated annealing can solve the Knapsack Problem, and therefore should be interesting to you.

Sounds like you have a hard problem there. This problem is well known, but no efficient solutions (can?) exist.

Other then the obvious way of trying all permuations of objects with size < bucket, you could also have a look at the implementation of the bucketizer perl module, which does exactly what you are asking for. I'm not sure what it does exactly, but the manual mentions that there is one "brute force" way, so I'm assuming there must also be some kind of optimization.

Thank you for your answers.
I looked into this problem more now with the guidance of the given answers. Among other things I found this webpage, http://www.mathmaniacs.org/lessons/C-subsetsum/index.html. It tells about the subset sum problem, which I believe is the problem I described here.
One sentence from the webpage is this:
--
You may want to point out that a number like 2300 is so large that even a computer counting at a speed of over a million or billion each second, would not reach 2300 until long after our sun had burned out.
--
Personally I would have more use for this algorithm when comparing a larger amount of file sizes than let's say 10 or less as it is somehow easy to reach the probably biggest sum just by trial and error manually if the number of files is low.
A CD with mp3:s can easily have 100 mp3s and a DVD a lot more, which leads to the sun burning out before I have the answer :).
Randomly trying to find the optimum sum can apparently get you pretty close but it can never be guaranteed to be the optimum answer and can also with bad luck be quite far away. Brute-force is the only real way it seems to get the optimum answer and that would take way too long.
So I guess I just continue estimating manually a good combination of files to burn on CDs and DVDs. :)
Thanks for the help. :)

If you're looking for a reasonable heuristic, and the objective is to minimize the number of disks required, here's a simple one you might consider. It's similar to one I used recently for a job-shop problem. I was able to compare it to known optima, and found it provided allocations that were either optimal or extremely close to being optimal.
Suppose B is the size of all files combined and C is the capacity of each disk. Then you will need at least n = roundup(B/C) disks. Try to fit all the files on n disks. If you are able to do so, you're finished, and have an optimal solution. Otherwise, try to fit all the files on n+1 disks. If you are able to do so, you have a heuristic solution; otherwise try to fit the files on n+2 disks, and so on, until you are able to do so.
For any given allocation of files to disks below (which may exceed some disk capacities), let si be the combined size of files allocated to disk i, and t = max si. We are finished when t <=C.
First, order (and index) the files largest to smallest.
For m >= n disks,
Allocate the files to the disks in a back-in-forth way: 1->1, 2->2, ... m->m, m+1>m-1, m+2->m-2, ... 2m->1, 2m+1->2, 2m+2->3 ... 3m->m, 3m+1->m-1, and so on until all files are allocated, with no regard to disk capacity. If t <= C we are finished (and the allocation is optimal if m = n); else go to #2.
Attempt to reduce t by moving a file from a disk i with si = t to another disk, without increasing t. Continue doing this until t <= C, in which case we are finished (and the allocation is optimal if m = n), or t cannot be reduced further, in which case go to #3.
Attempt to reduce t by performing pairwise exchanges between disks. Continue doing this until t <= C, in which case we are finished (and the allocation is optimal if m = n), or t cannot be reduced further with pairwise exchanges. In the latter case, repeat #2, unless no improvement was made the last time #2 was repeated, in which case increment m by one and repeat #1.
In #2 and #3 there are of course different ways to order possible reallocations and pairwise exchanges.

How to estimate complex algorithm facility requirements?

I'd like to understand how to efficiently estimate hardware requirements for certain complex algorithms using some well known heuristic approach.
Ie. I'd like to estimate quickly how much computer power is necessary to crack my TEA O(2^32) or XTEA O(2^115.15) in some reasonable time or other way round :
Having facility power of a 1000 x 4GHz quad core CPU's, how much time would it take to execute given algorithm?
I'd be also interested in other algo complexity estimations for algorithms like O(log N) etc..
regards
bua

ok, so I'd came up with some thing like this:
Simplifying that CPU clock is this same as MIPS.
having amount of instructions ex. 2^115 and a processor with ex. 1GHz clock
which is:
i = 2^115.15
clock = 1GHz
ipersec=1/10e+9
seconds = i * ipersec
in python:
def sec(N,cpuSpeedHz):
instructions=math.pow(2, N)
return instructions*(1./cpuSpeedHz)
ex
sec(115.15, math.pow(10,9)) / (365*24*60*60)
1.4614952014571389e+18
so it would take 1.4 ^ 18 years to calculate it
so having 1mln 4 cores 1Ghz processors it would take:
sec(115.15, 1000000*4*math.pow(10,9)) / (365*24*60*60)
365373800364.28467
it would take 3.6 ^ 11 years (~ 3600 mld years)
simplified version:
2^115.15 = 2^32 * 2^83.15
clock = 2^32 ~ 4Ghz
2^83.15 =
>>> math.pow(2,83.15)/(365*24*60*60)
3.4028086845230746e+17
checking:
2^32 = 10 ^ 9.63295986
>>> sec(115.15, math.pow(2,32))/(365*24*60*60)
3.4028086845230746e+17

Pick whichever answer you like:
More than you can afford
It would be far, far cheaper to keylog your machine
Where are you going to store to 2^20 plaintexts needed to achieve the O(2^115) time complexity
A whole bunch
If someone really wants your pr0n collection it is much easier to break the key holder than it is the key.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Program Runtime HW Problem - big-o

Yes. That is exactly how it works. Of course this ignores any possible initialization overhead, as this is not specified in big-o notation, but that's irrelevant for most algorithms.

Related

Time Complexity single loop vs multiple sequential loops

estimate performance gain based on application profiling (math)

Time taken in executing % / * + - operations

Finding the optimum file size combination

How to estimate complex algorithm facility requirements?

Categories

Resources