Is big-O notation relevant for concurrent world? [closed] - algorithm

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In most popular languages like C/C++/C#/Erlang/Java we have threads/processes; there is GPGPU computation market growing. If algorithm requires N data independent steps we get not the same performance as if algorithm would require all steps follow one another. So I wonder if big-O notation makes sense in concurrent world? And if it does not what is relevant to analyze algorithm performance?
You can have N or more processors in distributed environment (GPGPU / cluster / FPGA of future where you can get as many cores as you need - concurrent world, not limited to the number of parallel cores)

Big-O notation is still relevant.
You have a constant number of processors (by assumption), thus only a constant number of things can happen concurrently, thus you can only speed up an algorithm by a constant factor, and big-O ignores constant factors.
So whether you look at the total number of steps, or only consider the number of steps taken by the processor processing the most steps, we end up with exactly the same time complexity, and this still ends up giving us a decent idea of the rate of growth of the time taken in relation to the input size.
... future where you can get as much cores as you need - concurrent world, not limited to the number of parallel cores.
If we even get to the stage where you can run an algorithm with exponential running time on very large input in seconds, then yes, big-O notation, and the study of algorithms, will probably become much less relevant.
But considering, for example, that for an O(n!) algorithm, with n = 1000 (which is pretty small to be honest), it will require in the region of 4x10^2567 steps, which is about 4x10^2480 times more than the mere 10^87 estimated number of atoms in the entire observable universe. In short, big-O notation is unlikely to ever become irrelevant.
Even on the assumption of an effectively unlimited number of processors, we can still use big-O notation to indicate the steps taken by the processor processing the most steps (which should indicate the running time).
We could also use it to indicate the number of processors used, if we'd like.
The bottom line is that big-O notation is to show the rate of growth of a function - a function which could represent just about anything. Just because we typically use it to indicate the number of arithmetic computations, steps, comparisons or similar doesn't mean those are the only things we can use it for.

Big-O is a mathematical concept. It's always relevant, it would have been relevant before computers even existed, it's relevant now, it will always be relevant, it's even relevant to hypothetical aliens millions of light years away (if they know about it). Big-O is not just something we use to talk about how running time scales, it has a mathematical definition and it's about functions.
But there are many models of computation (unfortunately many people forget that, and even forget that what they're using is a model at all) and which ones make sense to use is not always the same.
For example, if you're talking about the properties of a parallel algorithm, assuming you have a constant number of processing elements essentially ignores the parallel nature of the algorithm. So in order to be able to express more, a commonly used model (but by no means the only one) is PRAM.
That you don't actually have an unlimited number of processing elements in reality is of no import. It's a model. The whole point is to abstract reality away. You don't have unlimited memory either, which is one of the assumptions of a Turing machine.
For models even further removed from reality, see hypercomputation.

Multithreading and using gpu's just uses parallelization to speed up the algorithms. But there are algorithms that cannot be speeded up this way.
Even if algorithms can speeded up by parallelization, a O(N log N) algorithm will be much faster than a O(N²) algorithm.

Related

Why is algorithm time complexity often defined in terms of steps/operations?

I've been doing a lot of studying from many different resources on algorithm analysis lately, and one thing I'm currently confused about is why time complexity is often defined in terms of the number of steps/operations an algorithm performs.
For instance, in Introduction to Algorithms, 3rd Edition by Cormen, he states:
The running time of an algorithm on a particular input is the number of primitive operations or “steps” executed. It is convenient to define the notion of step so that it is as machine-independent as possible.
I've seen other resources define the time complexity as such as well. I have a problem with this because, for one, it's called TIME complexity, not "step complexity" or "operations complexity." Secondly, while it's not a definitive source, an answer to a post here on Stackoverflow states "Running time is how long it takes a program to run. Time complexity is a description of the asymptotic behavior of running time as input size tends to infinity." Further, on the Wikipedia page for time complexity it states "In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm." Again, these are definitive sources, things makes logical sense using these definitions.
When analyzing an algorithm and deriving its time complexity function, such as in Figure 1 below, you get an equation that is in units of time. It CAN represent the amount of operations the algorithm performs, but only if those constant factors (C_1, C_2, C_3, etc.) are each a value of 1.
Figure 1
So with all that said, I'm just wondering how it's possible for this to be defined as the number of steps when that's not really what it represents. I'm trying to clear things up and make the connection between time and number of operations. I feel like there is a lot of information that hasn't been explicitly stated in the resources I've studied. Hoping someone can help clear things up for me, and without going into discussion about Big-O because that shouldn't be needed and misses the point of the question, in my opinion.
Thank you everyone for your time and help.
why time complexity is often defined in terms of the number of steps/operations an algorithm performs?
TL;DR: because that is how the asymptotic analysis work; also, do not forget, that time is a relative thing.
Longer story:
Measuring the performance in time, as we, humans understand the time in a daily use, doesn't make much sense, as it is not always that trivial task to do.. furthermore - it even makes no sense in a broader perspective.
How would you measure what is the space and time your algorithm takes? what will be the conditional and predefined unit of the measurement you're going to apply to see the running time/space complexity of your algorithm?
You can measure it on your clock, or use some libraries/API to see exactly how many seconds/minutes/megabytes your algorithm took.. or etc.
However, this all will be VERY much variable! because, the time/space your algorithm took, will depend on:
Particular hardware you're using (architecture, CPU, RAM, etc.);
Particular programming language;
Operating System;
Compiler, you used to compile your high-level code into lower abstraction;
Other environment-specific details (sometimes, even on the temperature.. as CPUs might be scaling operating frequency dynamically)..
therefore, it is not the good thing to measure your complexity in the precise timing (again, as we understand the timing on this planet).
So, if you want to know the complexity (let's say time complexity) of your algorithm, why would it make sense to have a different time for different machines, OSes, and etc.? Algorithm Complexity Analysis is not about measuring runtime on a particular machine, but about having a clear and mathematically defined precise boundaries for the best, average and worst cases.
I hope this makes sense.
Fine, we finally get to the point, that algorithm analysis should be done as a standalone, mathematical complexity analysis.. which would not care what is the machine, OS, system architecture, or anything else (apart from algorithm itself), as we need to observe the logical abstraction, without caring about whether you're running it on Windows 10, Intel Core2Duo, or Arch Linux, Intel i7, or your mobile phone.
What's left?
Best (so far) way for the algorithm analysis, is to do the Asymptotic Analysis, which is an abstract analysis calculated on the basis of input.. and that is counting almost all the steps and operations performed in the algorithm, proportionally to your input.
This way you can speak about the Algorithm, per se, instead of being dependent on the surrounding circumstances.
Moreover; not only we shouldn't care about machine or peripheral factors, we also shouldn't care about Lower Order Terms and Constant Factors in the mathematical expression of the Asymptotic Analysis.
Constant Factors:
Constant Factors are instructions which are independent from the Input data. i.e. which are NOT dependent on the input argument data.
Few reasons why you should ignore them are:
Different programming language syntaxes, as well as their compiled files, will have different number of constant operations/factors;
Different Hardware will give different run-time for the same constant factors.
So, you should eliminate thinking about analyzing constant factors and overrule/ignore them. Only focus on only input-related important factors; therefore:
O(2n) == O(5n) and all these are O(n);
6n2 == 10n2 and all these are n2.
One more reason why we won't care about constant factors is that they we usually want to measure the complexity for sufficiently large inputs.. and when the input grows to the + infinity, it really makes no sense whether you have n or 2n.
Lower order terms:
Similar concept applies in this point:
Lower order terms, by definition, become increasingly irrelevant as you focus on large inputs.
When you have 5x4+24x2+5, you will never care much on exponent that is less than 4.
Time complexity is not about measuring how long an algorithm takes in terms of seconds. It's about comparing different algorithms, how they will perform with a certain amount if input data. And how this performance develops when the input data gets bigger.
In this context, the "number of steps" is an abstract concept for time, that can be compared independently from any hardware. Ie you can't tell how long it will take to execute 1000 steps, without exact specifications of your hardware (and how long one step will take). But you can always tell, that executing 2000 steps will take about twice as long as executing 1000 steps.
And you can't really discuss time complexity without going into Big-O, because that's what it is.
You should note that Algorithms are more abstract than programs. You check two algorithms on a paper or book and you want to analyze which works faster for an input data of size N. So you must analyze them with logic and statements. You can also run them on a computer and measure the time, but that's not proof.
Moreover, different computers execute programs at different speeds. It depends on CPU speed, RAM, and many other conditions. Even a program on a single computer may be run at different speeds depending on available resources at a time.
So, time for algorithms must be independent of how long a single atomic instruction takes to be executed on a specific computer. It's considered just one step or O(1). Also, we aren't interested in constants. For example, it doesn't matter if a program has two or 10 instructions. Both will be run on a fraction of microseconds. Usually, the number of instructions is limited and they are all run fast on computers. What is important are instructions or loops whose execution depends on a variable, which could be the size of the input to the program.

Real running time calculation of matrix multiplication [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I would like to calculate approximately the running time of a matrix multiplication problem. Below are my assumptions:
No parallel programming
A 2 Ghz CPU
A square matrix of size n
An O(n^3) algorithm
For example suppose that n = 1000. So, how much time (approximately) should I expect taking the square of this matrix will take on the above assumptions.
Thanks.
This really terribly depends on the algorithm and the CPU. Even without parallelization, there's a lot of freedom in how the same steps would be represented on a CPU, and differences (in clock cycles needed for various operations) between different CPU's of the same family, too. Don't forget, either, that modern CPUs add some parallelization of instructions on their own. Optimization done by the compiler will make a difference in reordering memory order and branches and will likely convert instructions to vectorized ones even if you didn't specify that. Depending on further factors it may make a difference, too, whether your matrices are in a fixed location in memory or if you are accessing them by a pointer, and whether they are allocated with fixed size or each row / column dynamically. Don't forget about memory caching, page invalidations, and operation system scheduling, as I did in previous versions of my answer.
If this is for your own rough estimate or for a "typical" case, you won't do much wrong by just writing the program, running it in your specific conditions (as discussed above) in many repetitions for n = 1000, and calculating the average.
If you want a lot of hard work for a worse result, you can actually do what you probably meant to do in your original question yourself:
see what instructions your specific compiler produces for your specific algorithm under your specific conditions and with specific optimization settings (like here)
pick your specific processor and find its latency table for every instruction that's there,
add them up per iteration and multiply by 1000^3,
divide by the clock frequency.
Seriously, it's not worth the effort, a benchmark is faster, clearer, and more precise anyway (as this does not account for what happens in the branch predictor and hyperthreading and memory caching and other architectural details). If you want an exercise I'll leave that to you.

Why are numbers rounded in standard computer algorithms? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I understand that this makes the algorithms faster and use less storage space, and that these would have been critical features for software to run on the hardware of previous decades, but is this still an important feature? If the calculations were done with exact rational arithmetic then there would be no rounding errors at all, which would simplify many algorithms as you would no longer have to worry about catastrophic cancellation or anything like that.
Floating point is much faster than arbitrary-precision and symbolic packages, and 12-16 significant figures is usually plenty for demanding science/engineering applications where non-integral computations are relevant.
The programming language ABC used rational numbers (x / y where x and y were integers) wherever possible.
Sometimes calculations would become very slow because the numerator and denominator had become very big.
So it turns out that it's a bad idea if you don't put some kind of limit on the numerator and denominator.
In the vast majority of computations, the size of numbers required to to compute answers exactly would quickly grow beyond the point where computation would be worth the effort, and in many calculations it would grow beyond the point where exact calculation would even be possible. Consider that even running something like like a simple third-order IIR filter for a dozen iterations would require a fraction with thousands of bits in the denominator; running the algorithm for a few thousand iterations (hardly an unusual operation) could require more bits in the denominator than there exist atoms in the universe.
Many numerical algorithms still require fixed-precision numbers in order to perform well enough. Such calculations can be implemented in hardware because the numbers fit entirely in registers, whereas arbitrary precision calculations must be implemented in software, and there is a massive performance difference between the two. Ask anybody who crunches numbers for a living whether they'd be ok with things running X amount slower, and they probably will say "no that's completely unworkable."
Also, I think you'll find that having arbitrary precision is impractical and even impossible. For example, the number of decimal places can grow fast enough that you'll want to drop some. And then you're back to square one: rounded number problems!
Finally, sometimes the numbers beyond a certain precision do not matter anyway. For example, generally the nnumber of significant digits should reflect the level of experimental uncertainty.
So, which algorithms do you have in mind?
Traditionally integer arithmetic is easier and cheaper to implement in hardware (uses less space on the die so you can fit more units on there). Especially when you go into the DSP segment this can make a lot of difference.

why program running time is not a measure?

i have learned that a program is measured by it's complexity - i mean by Big O Notation.
why don't we measure it by it's absolute running time?
thanks :)
You use the complexity of an algorithm instead of absolute running times to reason about algorithms, because the absolute running time of a program does not only depend on the algorithm used and the size of the input. It also depends on the machine it's running on, various implementations detail and what other programs are currently using system resources. Even if you run the same application twice with the same input on the same machine, you won't get exactly the same time.
Consequently when given a program you can't just make a statement like "this program will take 20*n seconds when run with an input of size n" because the program's running time depends on a lot more factors than the input size. You can however make a statement like "this program's running time is in O(n)", so that's a lot more useful.
Absolute running time is not an indicator of how the algorithm grows with different input sets. It's possible for a O(n*log(n)) algorithm to be far slower than an O(n^2) algorithm for all practical datasets.
Running time does not measure complexity, it only measures performance, or the time required to perform the task. An MP3 player will run for the length of the time require to play the song. The elapsed CPU time may be more useful in this case.
One measure of complexity is how it scales to larger inputs. This is useful for planning the require hardware. All things being equal, something that scales relatively linearly is preferable to one which scales poorly. Things are rarely equal.
The other measure of complexity is a measure of how simple the code is. The code complexity is usually higher for programs with relatively linear performance complexity. Complex code can be costly maintain, and changes are more likely to introduce errors.
All three (or four) measures are useful, and none of them are highly useful by themselves. The three together can be quite useful.
The question could use a little more context.
In programming a real program, we are likely to measure the program's running time. There are multiple potential issues with this though
1. What hardware is the program running on? Comparing two programs running on different hardware really doesn't give a meaningful comparison.
2. What other software is running? If anything else running, it's going to steal CPU cycles (or whatever other resource your program is running on).
3. What is the input? As already said, for a small set, a solution might look very fast, but scalability goes out the door. Also, some inputs are easier than others. If as a person, you hand me a dictionary and ask me to sort, I'll hand it right back and say done. Giving me a set of 50 cards (much smaller than a dictionary) in random order will take me a lot longer to do.
4. What is the starting conditions? If your program runs for the first time, chances are, spinning it off the hard disk will take up the largest chunk of time on modern systems. Comparing two implementations with small inputs will likely have their differences masked by this.
Big O notation covers a lot of these issues.
1. Hardware doesn't matter, as everything is normalized by the speed of 1 operation O(1).
2. Big O talks about the algorithm free of other algorithms around it.
3. Big O talks about how the input will change the running time, not how long one input takes. It tells you the worse the algorithm will perform, not how it performs on an average or easy input.
4. Again, Big O handles algorithms, not programs running in a physical system.

What can be parameters other than time and space while analyzing certain algorithms?

I was interested to know about parameters other than space and time during analysing the effectiveness of an algorithms. For example, we can focus on the effective trap function while developing encryption algorithms. What other things can you think of ?
First and foremost there's correctness. Make sure your algorithm always works, no matter what the input. Even for input that the algorithm is not designed to handle, you should print an error mesage, not crash the entire application. If you use greedy algorithms, make sure they truly work in every case, not just a few cases you tried by hand.
Then there's practical efficiency. An O(N2) algorithm can be a lot faster than an O(N) algorithm in practice. Do actual tests and don't rely on theoretical results too much.
Then there's ease of implementation. You usually don't need the best intro sort implementation to sort an array of 100 integers once, so don't bother.
Look for worst cases in your algorithms and if possible, try to avoid them. If you have a generally fast algorithm but with a very bad worst case, consider detecting that worst case and solving it using another algorithm that is generally slower but better for that single case.
Consider space and time tradeoffs. If you can afford the memory in order to get better speeds, there's probably no reason not to do it, especially if you really need the speed. If you can't afford the memory but can afford to be slower, do that.
If you can, use existing libraries. Don't roll your own multiprecision library if you can use GMP for example. For C++, stuff like boost and even the STL containers and algorithms have been worked on for years by an army of people and are most likely better than you can do alone.
Stability (sorting) - Does the algorithm maintain the relative order of equal elements?
Numeric Stability - Is the algorithm prone to error when very large or small real numbers are used?
Correctness - Does the algorithm always give the correct answer? If not, what is the margin of error?
Generality - Does the algorithm work in many situation (e.g. with many different data types)?
Compactness - Is the program for the algorithm concise?
Parallelizability - How well does performance scale when the number of concurrent threads of execution are increased?
Cache Awareness - Is the algorithm designed to maximize use of the computer's cache?
Cache Obliviousness - Is the algorithm tuned for particulary cache-sizes / cache-line-sizes or does it perform well regardless of the parameters of the cache?
Complexity. 2 algorithms being the same in all other respects, the one that's much simpler is going to be a much better candidate for future customization and use.
Ease of parallelization. Depending on your use case, it might not make any difference or, on the other hand, make the algorithm useless because it can't use 10000 cores.
Stability - some algorithms may "blow up" with certain test conditions, e.g. take an inordinately long time to execute, or use an inordinately large amount of memory, or perhaps not even terminate.
For algorithms that perform floating point operations, the accumulation of round-off error is often a consideration.
Power consumption, for embedded algorithms (think smartcards).
One important parameter that is frequently measure in the analysis of algorithms is that of Cache hits and cache misses. While this is a very implementation and architecture dependent issue, it is possible to generalise somewhat. One particularly interesting property of the algorithm is being Cache-oblivious, which means that the algorithm will use the cache optimally on multiple machines with different cache sizes and structures without modification.
Time and space are the big ones, and they seem so plain and definitive, whereby they should often be qualified (1). The fact that the OP uses the word "parameter" rather than say "criteria" or "properties" is somewhat indicative of this (as if a big O value on time and on space was sufficient to frame the underlying algorithm).
Other criteria include:
domain of applicability
complexity
mathematical tractability
definitiveness of outcome
ease of tuning (may be tied to "complexity" and "tactability" afore mentioned)
ability of running the algorithm in a parallel fashion
(1) "qualified": As hinted in other answers, a -technically- O(n^2) algorithm may be found to be faster than say an O(n) algorithm, in 90% of the cases (which, btw, may turn out to be 100% of the practical cases)
worst case and best case are also interesting, especially when linked to some conditions in the input. if your input data shows some properties, an algorithm, by taking advantage of this property, may perform better that another algorithm which performs the same task but does not use that property.
for example, many sorting algorithm perform very efficiently when input are partially ordered in a specific way which minimizes the number of operations the algorithm has to execute.
(if your input is mostly sorted, an insertion sort will fit nicely, while you would never use that algorithm otherwise)
If we're talking about algorithms in general, then (in the real world) you might have to think about CPU/filesystem(read/write operations)/bandwidth usage.
True they are way down there in the list of things you need worry about these days, but given a massive enough volume of data and cheap enough infrastructure you might have to tweak your code to ease up on one or the other.
What you are interested aren’t parameters, rather they are intrinsic properties of an algorithm.
Anyway, another property you might be interested in, and analyse an algorithm for, concerns heuristics (or rather, approximation algorithms), i.e. algorithms which don’t find an exact solution but rather one that is (hopefully) good enough.
You can analyze how far a solution is from the theoretical optimal solution in the worst case. For example, an existing algorithm (forgot which one) approximates the optimal travelling salesman tour by a factor of two, i.e. in the worst case it’s twice as long as the optimal tour.
Another metric concerns randomized algorithms where randomization is used to prevent unwanted worst-case behaviours. One example is randomized quicksort; quicksort has a worst-case running time of O(n2) which we want to avoid. By shuffling the array beforehand we can avoid the worst-case (i.e. an already sorted array) with a very high probability. Just how high this probability is can be important to know; this is another intrinsic property of the algorithm that can be analyzed using stochastic.
For numeric algorithms, there's also the property of continuity: that is, whether if you change input slightly, output also changes only slightly. See also Continuity analysis of programs on Lambda The Ultimate for a discussion and a link to an academical paper.
For lazy languages, there's also strictness: f is called strict if f _|_ = _|_ (where _|_ denotes the bottom (in the sense of domain theory), a computation that can't produce a result due to non-termination, errors etc.), otherwise it is non-strict. For example, the function \x -> 5 is non-strict, because (\x -> 5) _|_ = 5, whereas \x -> x + 1 is strict.
Another property is determinicity: whether the result of the algorithm (or its other properties, such as running time or space consumption) depends solely on its input.
All these things in the other answers about the quality of various algorithms are important and should be considered.
But time and space are two things that vary at some rate compared to the size of the input (n). So what else can vary according to n?
There are several that are related to I/O. For example, the number of writes to a disk is an important one, which may not be directly shown by space and time estimates alone. This becomes particularly important with flash memory, where the number of writes to the same memory location is the significant metric in some algorithms.
Another I/O metric would be "chattiness". A networking protocol might send shorter messages more often adding up to the same space and time as another networking protocol, but some aspect of the system (perhaps billing?) might make minimizing either the size or number of the messages desireable.
And that brings us to Cost, which is a very important algorithmic consideration sometimes. The cost of an algorithm may be affected by both space and time in different amounts (consider the separate costing of server storage space and gigabits of data transfer), but the cost is the thing that you wish to minimize overall, so it may have its own big-O estimations.

Resources