Calculating time space complexity in lazily evaluated languages - lazy-evaluation

Can anyone tell me how does one go about calculating the time space complexity of lazily evaluated languages?. I heard its difficult, but I would like to give it a try.
Thanks in advance

At least for the case of data structures (I dare wager that generalises to vanilla code), you will find information in Okasaki's "Purely functional data structures. Other papers from him might be valuable to your undertaking, too.

Related

What is the core difference between algorithm and pseudocode?

As the question describe itself "What is the core difference between algorithm and pseudocode?".
algorithm
An algorithm is a procedure for solving a problem in terms of the actions to be executed and the order in which those actions are to be executed. An algorithm is merely the sequence of steps taken to solve a problem. The steps are normally "sequence," "selection, " "iteration," and a case-type statement.
Pseudocode
Pseudocode is an artificial and informal language that helps programmers develop algorithms. Pseudocode is a "text-based" detail (algorithmic) design tool.
The rules of Pseudocode are reasonably straightforward. All statements showing "dependency" are to be indented. These include while, do, for, if, switch. Examples below will illustrate this notion.
I think all the other answers give useful explanations and definitions, but I'm going to give mine.
An algorithm is the idea of how to obtain some result from some input. It is an abstract concept; an algorithm is not something material by itself, but more something like an imagination or a deduction, a thing that only exists in the mind. In the broad sense, any sequence of steps that give you some thing(s) from other thing(s) could be called an algorithm. For example, if the screen of your computer is dirty, "spraying some glass cleaner on it and wipe it with a cloth" could be said to be an algorithm to solve the problem of how to obtain a clean screen form a dirty screen. It is important to note the difference between the problem itself (getting a clean screen) and the algorithm (wiping it with a cloth and cleaner); generally, several different algorithms are possible to solve the same problem. The idea of complexity is inherent to the algorithms itself, not the problem or the particular implementation or execution of the algorithm.
Pseudocode is a language to express algorithms. Since, as said before, algorithms are only concepts, we need to use something to express them and explain them to other people. Pseudocode is a convenient way for many computer science algorithms, because it is usually unambiguous, easy to read and somewhat similar to many programming languages. However, a specific programming language like C or Java can also be used to express and algorithm (it's just less convenient to those not familiarized with that language). In other cases, pseudocode may not be the best way to express an algorithm; for example, many graph and tree algorithms can be explained more easily with drawings or diagrams. In the previous example, the algorithm to get your screen cleaned is probably better expressed in a natural language like English, because it is simple and specific enough for that case.
Obviously, terms are frequently used loosely and exchanged depending on the context, and there's no need to be nitpicky about it, but I think it is important to have the difference clear. An algorithm doesn't stop being an algorithm just because it is written in Python instead of pseudocode. Pseudocode is just a convenient and widespread communication tool to express them.
An algorithm is something (a sequence of steps) you can do. Pseudocode is a notation to describe an algorithm.
Algorithm is something which is represented in mathematical terms. It includes, analysis, complexity considerations(best, average and worstcase analysis etc.).Pseudo code is a human readable representation of a program.
From Wikipedia :
Starting from an initial state and initial input (perhaps empty), the instructions describe a computation that, when executed, proceeds through a finite number of well-defined successive states, eventually producing "output" and terminating at a final ending state. 
With a pseudo language one can implement an algorithm without using a programming language such as C.
An example of pseudo language is Flow Charts.

Efficient way of Writing algorithm

I was wondering when some one asks you to solve an algorithmic problem, is it a good way to actually start off with Hastable, Hashset or HashMap. Normally i have heard people saying you shouldn't come up with Hashes as your first answer.
So how should we go about in algorithms: In-place should be given importance or make sure time complexity is best
I'm not trying to generalise, but still some suggestions would be helpful.
Thanks
The best you can hope for is a generalized answer for your generalized question.
It depends.
The reason there are many different algorithms is because there is not always 1 algorithm that is the best. And many algorithms aim to solve different problems from each other. Some algorithms it makes no sense to even talk about hash tables.
If someone asks me to solve an algorithmic problem though, I will probably try to use something that is built in to the language I'm using before designing my own algorithm. The reason is because I value my time. If I find later that the code is not efficient enough, then I can look for a better way to do it.
I think it is really situational. If random access is a priority and you need fast access and little constraint on memory utilization and no sequential access, then Hashtable, (et al), is the choice.

Do you use Big-O complexity evaluation in the 'real world'?

Recently in an interview I was asked several questions related to the Big-O of various algorithms that came up in the course of the technical questions. I don't think I did very well on this... In the ten years since I took programming courses where we were asked to calculate the Big-O of algorithms I have not have one discussion about the 'Big-O' of anything I have worked on or designed. I have been involved in many discussions with other team members and with the architects I have worked with about the complexity and speed of code, but I have never been part of a team that actually used Big-O calculations on a real project. The discussions are always "is there a better or more efficient way to do this given our understanding of out data?" Never "what is the complexity of this algorithm"?
I was wondering if people actually have discussions about the "Big-O" of their code in the real word?
It's not so much using it, it's more that you understand the implications.
There are programmers who do not realise the consequence of using an O(N^2) sorting algorithm.
I doubt many apart from those working in academia would use Big-O Complexity Analysis in anger day-to-day.
No needless n-squared
In my experience you don't have many discussions about it, because it doesn't need discussing. In practice, in my experience, all that ever happens is you discover something is slow and see that it's O(n^2) when in fact it could be O(n log n) or O(n), and then you go and change it. There's no discussion other than "that's n-squared, go fix it".
So yes, in my experience you do use it pretty commonly, but only in the basest sense of "decrease the order of the polynomial", and not in some highly tuned analysis of "yes, but if we switch to this crazy algorithm, we'll increase from logN down to the inverse of Ackerman's function" or some such nonsense. Anything less than a polynomial, and the theory goes out the window and you switch to profiling (e.g. even to decide between O(n) and O(n log n), measure real data).
Big-O notation is rather theoretical, while in practice, you are more interested in actual profiling results which give you a hard number as to how your performance is.
You might have two sorting algorithms which by the book have O(n^2) and O(nlogn) upper bounds, but profiling results might show that the more efficient one might have some overhead (which is not reflected in the theoretical bound you found for it) and for the specific problem set you are dealing with, you might choose the theoretically-less-efficient sorting algorithm.
Bottom line: in real life, profiling results usually take precedence over theoretical runtime bounds.
I do, all the time. When you have to deal with "large" numbers, typically in my case: users, rows in database, promotion codes, etc., you have to know and take into account the Big-O of your algorithms.
For example, an algorithm that generates random promotion codes for distribution could be used to generate billions of codes... Using a O(N^2) algorithm to generate unique codes means weeks of CPU time, whereas a O(N) means hours.
Another typical example is queries in code (bad!). People look up a table then perform a query for each row... this brings up the order to N^2. You can usually change the code to use SQL properly and get orders of N or NlogN.
So, in my experience, profiling is useful ONLY AFTER the correct class of algorithms is used. I use profiling to catch bad behaviours like understanding why a "small" number bound application under-performs.
The answer from my personal experience is - No. Probably the reason is that I use only simple, well understood algorithms and data structures. Their complexity analysis is already done and published, decades ago. Why we should avoid fancy algorithms is better explained by Rob Pike here. In short, a practitioner almost never have to invent new algorithms and as a consequence almost never have to use Big-O.
Well that doesn't mean that you should not be proficient in Big-O. A project might demand the design and analysis of an altogether new algorithm. For some real-world examples, please read the "war stories" in Skiena's The Algorithm Design Manual.
To the extent that I know that three nested for-loops are probably worse than one nested for-loop. In other words, I use it as a reference gut feeling.
I have never calculated an algorithm's Big-O outside of academia. If I have two ways to approach a certain problem, if my gut feeling says that one will have a lower Big-O than the other one, I'll probably instinctively take the smaller one, without further analysis.
On the other hand, if I know for certain the size of n that comes into my algorithm, and I know for certain it to be relatively small (say, under 100 elements), I might take the most legible one (I like to know what my code does even one month after it has been written). After all, the difference between 100^2 and 100^3 executions is hardly noticeable by the user with today's computers (until proven otherwise).
But, as others have pointed out, the profiler has the last and definite word: If the code I write executes slowly, I trust the profiler more than any theoretical rule, and fix accordingly.
I try to hold off optimizations until profiling data proves they are needed. Unless, of course, it is blatently obvious at design time that one algorithm will be more efficient than the other options (without adding too much complexity to the project).
Yes, I use it. And no, it's not often "discussed", just like we don't often discuss whether "orderCount" or "xyz" is a better variable name.
Usually, you don't sit down and analyze it, but you develop a gut feeling based on what you know, and can pretty much estimate the O-complexity on the fly in most cases.
I typically give it a moment's thought when I have to perform a lot of list operations. Am I doing any needless O(n^2) complexity stuff, that could have been done in linear time? How many passes am I making over the list? It's not something you need to make a formal analysis of, but without knowledge of big-O notation, it becomes a lot harder to do accurately.
If you want your software to perform acceptably on larger input sizes, then you need to consider the big-O complexity of your algorithms, formally or informally. Profiling is great for telling you how the program performs now, but if you're using a O(2^n) algorithm, your profiler will tell you that everything is just fine as long as your input size is tiny. And then your input size grows, and runtime explodes.
People often dismiss big-O notation as "theoretical", or "useless", or "less important than profiling". Which just indicates that they don't understand what big-O complexity is for. It solves a different problem than a profiler does. Both are essential in writing software with good performance. But profiling is ultimately a reactive tool. It tells you where your problem is, once the problem exists.
Big-O complexity proactively tells you which parts of your code are going to blow up if you run it on larger inputs. A profiler can not tell you that.
No. I don't use Big-O complexity in 'real world' situations.
My view on the whole issue is this - (maybe wrong.. but its just my take.)
The Big-O complexity stuff is ultimately to understand how efficient an algorithm is. If from experience or by other means, you understand the algorithms you are dealing with, and are able to use the right algo in the right place, thats all that matters.
If you know this Big-O stuff and are able to use it properly, well and good.
If you don't know to talk about algos and their efficiencies in the mathematical way - Big-O stuff, but you know what really matters - the best algo to use in a situation - thats perfectly fine.
If you don't know either, its bad.
Although you rarely need to do deep big-o analysis of a piece of code, it's important to know what it means and to be able to quickly evaluate the complexity of the code you're writing and the consequences it might have.
At development time you often feel like it's "good enough". Eh, no-one will ever put more than 100 elements in this array right ? Then, one day, someone will put 1000 elements in the array (trust users on that: if the code allows it, one of them will do it). And that n^2 algorithm that was good enough now is a big performance problem.
It's sometimes usefull the other way around: if you know that you functionaly have to make n^2 operations and the complexity of your algorithm happens to be n^3, there might be something you can do about it to make it n^2. Once it's n^2, you'll have to work on smaller optimizations.
In the contrary, if you just wrote a sorting algorithm and find out it has a linear complexity, you can be sure that there's a problem with it. (Of course, in real life, occasions were you have to write your own sorting algorithm are rare, but I once saw someone in an interview who was plainly satisfied with his one single for loop sorting algorithm).
Yes, for server-side code, one bottle-neck can mean you can't scale, because you get diminishing returns no matter how much hardware you throw at a problem.
That being said, there are often other reasons for scalability problems, such as blocking on file- and network-access, which are much slower than any internal computation you'll see, which is why profiling is more important than BigO.

Do you find cyclomatic complexity a useful measure?

I've been playing around with measuring the cyclomatic complexity of a big code base.
Cyclomatic complexity is the number of linearly independent paths through a program's source code and there are lots of free tools for your language of choice.
The results are interesting but not surprising. That is, the parts I know to be the hairiest were in fact the most complex (with a rating of > 50). But what I am finding useful is that a concrete "badness" number is assigned to each method as something I can point to when deciding where to start refactoring.
Do you use cyclomatic complexity? What's the most complex bit of code you found?
We refactor mercilessly, and use Cyclomatic complexity as one of the metrics that gets code on our 'hit list'. 1-6 we don't flag for complexity (although it could get questioned for other reasons), 7-9 is questionable, and any method over 10 is assumed to be bad unless proven otherwise.
The worst we've seen was 87 from a monstrous if-else-if chain in some legacy code we had to take over.
Actually, cyclomatic complexity can be put to use beyond just method level thresholds. For starters, one big method with high complexity may be broken into several small methods with lower complexity. But has it really improved the codebase? Granted, you may get somewhat better readability by all those method names. But the total conditional logic hasn't changed. And the total conditional logic can often be reduced by replacing conditionals with polymorphism.
We need a metric that doesn't turn green by mere method decomposition. I call this CC100.
CC100 = 100 * (Total cyclomatic complexity of codebase) / (Total lines of code)
It's useful to me in the same way that big-O is useful: I know what it is, and can use it to get a gut feeling for whether a method is good or bad, but I don't need to compute it for every function I've written.
I think simpler metrics, like LOC, are at least as good in most cases. If a function doesn't fit on one screen, it almost doesn't matter how simple it is. If a function takes 20 parameters and makes 40 local variables, it doesn't matter if its cyclomatic complexity is 1.
Until there is a tool that can work well with C++ templates, and meta-programming techniques, it's not much help in my situation. Anyways just remember that
"not all things that count can be
measured, and not all things that can
be measured count"
Einstein
So remember to pass any information of this type through human filtering too.
We recently started to use it. We use NDepend to do some static code analysis, and it measures cyclomatic complexity. I agree, it's a decent way to identify methods for refactoring.
Sadly, we have seen #'s above 200 for some methods created by our developers offshore.
You'll know complexity when you see it. The main thing this kind of tool is useful for is flagging the parts of the code that were escaping your attention.
I frequently measure the cyclomatic complexity of my code. I've found it helps me spot areas of code that are doing too much. Having a tool point out the hot-spots in my code is much less time consuming than having to read through thousands of lines of code trying to figure out which methods are not following the SRP.
However, I've found that when I do a cyclomatic complexity analysis on other people's code it usually leads to feelings of frustration, angst, and general anger when I find code with cyclomatic complexity in the 100's. What compels people to write methods that have several thousand lines of code in them?!
It's great for help identifying candidates for refactoring, but it's important to keep your judgment around. I'd support kenj0418's ranges for pruning guides.
There's a Java metric called CRAP4J that empirically combines cyclomatic complexity and JUnit test coverage to come up with a single metric. He's been doing research to try and improve his empirical formula. I'm not sure how widespread it is.
Cyclomatic Complexity is just one composant of what could be called Fabricated Complexity. A while back, I wrote an article to summarize several dimensions of code complexity:
Fighting Fabricated Complexity
Tooling is needed to be efficient at handling code complexity. The tool NDepend for .NET code will let you analyze many dimensions of the code complexity including code metrics like:
Cyclomatic Complexity, Nesting Depth, Lack Of Cohesion of Methods, Coverage by Tests...
including dependencies analysis and including a language (Code Query Language) dedicated to ask, what is complex in my code, and to write rule?
Yes, we use it and I have found it useful too. We have a big legacy code base to tame and we found alarming high cyclomatic complexity. (387 in one method!). CC points you directly to areas that are worth to refactor. We use CCCC on C++ code.
I haven't used it in a while, but on a previous project it really helped identify potential trouble spots in someone elses code (wouldn't be mine of course!)
Upon finding the area's to check out, i quickly found numerious problems (also lots of GOTOS would you believe!) with logic and some really strange WTF code.
Cyclomatic complexity is great for showing areas which probably are doing to much and therefore breaking the single responsibilty prinicpal. These's ideally should be broken up into mulitple functions
I'm afraid that for the language of the project for which I would most like metrics like this, LPC, there are not, in fact, lots of free tools for producing it available. So no, not so useful to me.
+1 for kenj0418's hit list values.
The worst I've seen was a 275. There were a couple others over 200 that we were able to refactor down to much smaller CCs; they were still high but it got them pushed further back in line. We didn't have much luck with the 275 beast -- it was (probably still is) a web of if- and switch-statements that was just way too complex. It's only real value is as a step-through when they decide to rebuild the system.
The exceptions to high CC that I was comfortable with were factories; IMO, they are supposed to have a high CC but only if they are only doing simple object creation and returning.
After understanding what it means, I now have started to use it on a "trial" basis. So far I have found it to be useful, because usually high CC goes hand in hand with the Arrow Anti-Pattern, which makes code harder to read and understand. I do not have a fixed number yet, but NDepend is alerting for everything above 5, which looks like a good start to investigate methods.

Real world implementations of "classical algorithms"

I wonder how many of you have implemented one of computer science's "classical algorithms" like Dijkstra's algorithm or data structures (e.g. binary search trees) in a real world, not academic project?
Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality?
Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality?
The library doesn't know what your problem domain is and won't be able to chose the correct algorithm to do the job. That is why I think it is important to know about them: then YOU can make the correct choice of algorithms to solve YOUR problem.
Knowing, or being able to understand these algorithms is important, these are the tools of your trade. It does not mean you have to be able to implement A* in an hour from memory. But you should be able to figure out what the advantages of using a red-black tree as opposed to a normal unbalanced tree are so you can decide if you need it or not. You need to be able to judge the fitness of an algorithm for solving your problem.
This might sound too school-masterish but these "classical algorithms" were not invented to give college students exam questions, they were invented to solve problems or improve on current solutions, just like the array, the linked list or the stack are building blocks to write a program so are some of these. Just like in math where you move from addition and subtraction to integration and differentiation, these are advanced techniques that will help you solve problems that are out there.
They might not be directly applicable to your problems or work situation but in the long run knowing of them will help you as a professional software engineer.
To answer your question, I did an implementation of A* recently for a game.
Is there a benefit to understanding your tools, rather than simply knowing that they exist?
Yes, of course there is. Taking a trivial example, don't you think there's a benefit to knowing what the difference is List (or your language's equivalent dynamic array implementation) and LinkedList (or your language's equivalent)? It's pretty important to know that one has constant random access time, while the other is linear. And one requires N copies if you insert a value in the middle of the sequence, while the other can do it in constant time.
Don't you think there's an advantage to understanding that the same sorting algorithm isn't always optimal? That for almost-sorted data, quicksort sucks, for example? Naively just calling Sort() and hoping for the best can become ridiculously expensive if you don't understand what's happening under the hood.
Of course there are a lot of algorithms you probably won't need, but even so, just understanding how they work may make it easier for yourself to come up with efficient algorithms to solve other, unrelated, problems.
Well, someone has to write the libraries. While working at a mapping software company, I implemented Dijkstra's, as well as binary search trees, b-trees, n-ary trees, bk-trees and hidden markov models.
Besides, if all you want is a single 'well known' algorithm, and you also want the freedom to specialise it and optimise it if it becomes critical to performance, including a whole library seems like a poor choice.
We use a home grown implementation of a p-random number generator from Knuth SemiNumeric as an aid in some statistical processing
In my previous workplace, which was an EDA company, we implemented versions of Prim and Dijsktra's algorithms, disjoint set data structures, A* search and more. All of these had real world significance. I believe this is dependent on problem domain - some domains are more algorithm-intensive and some less so.
Having said that, there is a fine line to walk - I see no business reason for re-implementing STL or Java Generics. In many cases, a standard library is better than "inventing a wheel". The more you are near your core application, the more it may be necessary to implement a textbook algorithm or data structure.
If you never work with performance-critical code, consider yourself lucky. However, I consider this scenario unrealistic. Performance problems could occur anywhere. And then it's necessary to know how to fix that problem. Obviously, merely knowing a few algorithm names isn't enough here – unless you want to implement them all and try them out one after the other.
No, knowing (at least some of) the inner workings of different algorithms is important for gauging their strengths and weaknesses and for analyzing how they would handle your situation.
Obviously, if there's a library already implementing exactly what you need, you're incredibly lucky. But let's face it, even if there is such a library, using it is often not completely straightforward (at the very least, interfaces and data representation often have to be adapted) so it's still good to know what to expect.
A* for a pac man clone. It took me weeks to really get but to this day I consider it a thing of beauty.
I've had to implement some of the classical algorithms from numerical analysis. It was easier to write my own than to connect to an existing library. Also, I've had to write variations on classical algorithms because the textbook case didn't fit my application.
For classical data structures, I nearly always use the standard libraries, such as STL for C++. The one time recently when I thought STL didn't have the structure I needed (a heap) I rolled my own, only to have someone point out almost immediately that I didn't need to do that.
Classical algorithms I have used in actual work:
A topological sort
A red-black tree (although I will
confess that I only had to implement
insertions for that application and
it only got used in a prototype).
This got used to implement an
'ordered dict' type structure in
Python.
A priority queue
State machines of various sorts
Probably one or two others I can't remember.
As to the second part of the question:
An understanding of how the algorithms work, their complexity and semantics gets used on a fairly regular basis. They also inform the design of systems. Occasionally one has to do things involving parsing or protocol handling, or some computation that's slightly clever. Having a working knowledge of what the algorithms do, how they work, how expensive they are and where one might find them lying around in library code goes a long way to knowing how to avoid reinventing the wheel poorly.
I use the Levenshtein distance algorithm to help implement a 'Did you mean [suggested word]?' feature in our website search.
Works quite well when combined with our 'tagging' system, which allows us to associate extra words (other than those in title/description/etc) with items in the database. \
It's not perfect by any means, but it's way better than most corporate site searches, if I don't say so myself ; )
Classical algorithms are usually associated with something glamorous, like games, or Web search, or scientific computation. However, I had to use some of the classical algorithms for a mere enterprise application.
I was building a metadata migration tool, and I had to use topological sort for dependency resolution, various forms of graph traversals for queries on metadata, and a modified variation of Tarjan's union-find datastructure to partition forest-like structured metadata to trees.
That was a really satisfying experience. Most of those algorithms were implemented before, but their implementations lacked something that I would need for my task. That's why It's important to understand their internals.

Resources