What are data structures at the lowest level? - data-structures

I recetly watched a SICP lecture in which Sussman demonstrated how it was possible to implement Scheme's cons car and cdr using nothing but procudures.
It went something like this:
(define (cons x y)
(lambda (m) (m x y)))
(define (car z)
(z (lambda (p q) p)))
This got me thinking; What excatly are data structures? When a language is created, are data structures implemented as abstractions built up by procedures? If they are just made up of procedures, what are procedures really at the lowest level?
I guess what I want to know is what is at the bottom of the abstraction chain (unless it happens to be abstractions all the way down). At what point does it become hardware?

The trick is that you don't have to care. cons, cdr and car are abstractions and so their underlying implementation shouldn't matter.
What you have there are what are called "Church pairs", we build everything from functions. In modern machines we build everything from strings of 1s and 0s, but really, it doesn't matter.
Now, if you're wondering how all of these abstractions are implemented at in your particular implementation, that depends. In all likelihood your compiler/interpeter is jumping through hoops behind the scenes allocating cons cells as tightly packed pair of pointers (or similar) and turning your functions into a string of 0's and 1's forming the appropriate machine code and pairing this with a pointer to its environment.
But like I said, the whole beauty of building these abstractions is that you don't have to care as a user :)

Cool question!
It becomes hardware whenever someone hands it off to calls to the processor (CPU). If you have to read a manual from a chip manufacturer to understand how to do what you want to do, you're at the level you describe.
(source: micro-examples.com)
Here's an overview of the path from physics to hardware to code to users: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-002-circuits-and-electronics-spring-2007/video-lectures/lecture-1/ although I don't think it's to do with lambda calculus. (But you're just saying that's what inspired the question—it's not the question per se, right?)
There is a step where the person who writes the language has to interface with processor instructions, eg https://en.wikipedia.org/wiki/X86_instruction_listings || https://en.wikipedia.org/wiki/X86_calling_conventions.
Peek at kernel programming or think about embedded systems (in a microwave, on the wing of an airplane), ARMs, or mobile devices—these are the things people program with that have non-laptop chipsets.
People who write BLAS (linear algebra solver libraries) sometimes get down to this level of detail. For example https://en.wikipedia.org/wiki/Math_Kernel_Library or https://en.wikipedia.org/wiki/Automatically_Tuned_Linear_Algebra_Software. When they say the BLAS is "tuned" what do they mean? They're talking about sniffing out facts about your processor and changing how inner-inner-inner loops make their decisions to waste less time with the way the physical thing is configured.
(source: hswstatic.com)
If I recall correctly, high-level programming languages (like C ;) ) make no assumptions about what system they will be running on so they make agnostic calls that run like ten times† slower than they would if they knew ahead of time which kind of call to make. Every. Time. This is the kind of thing that could drive you insane, but it's that classic engineering tradeoff of the technical people's time versus the end-user's performance. I guess if you become a kernel programmer or embedded-systems programmer you can help put an end to all of the wasted clock cycles on computers across the globe—processors getting hot as they waste a lot of pointless going back-and-forth. (Although there are clearly worse things that go on in the world.)
†: I just quickly searched how much BLAS speedups are and yeah, it can be a factor of like 15 or 20. So I don't think I'm exaggerating / misremembering what I heard about wasted movement. BTW, there is something similar goes on in power generation: the final step (the turbine) in power generation is only like 20% efficient. Doesn't all that waste just drive you crazy?! Time to become an engineer. ;)
A cool project you could check out is MenuetOS; someone wrote an operating system in assembler.
Yet more cool stuff to look at could be this guy who says it's actually pretty easy and fun to learn x86 assembly language. (!)
(source: iqsdirectory.com)
You can also read back on the olden days when there was less distance between software and hardware (eg, programming with a punchcard). Thankfully people have written "high-level" languages that are more like the way people talk and think and less like moving some tape around. Data structures may not be the most obvious thing from everyday conversation but they are more abstract than [lists a sequence of GOTO and STORE instructions...].
HTH

The substrate at the bottom of the abstraction chain varies depending on the hardware, but it's probably going to be some sort of register machine, which will have some kind of native instruction set.
SICP's last chapter is all about this layer. The book doesn't present any real-life hardware, but an abstract (ahem turtles!) register machine that sort of resembles what might actually be going on ... which happens to be modeled in Scheme, just for extra metacircular fun. I found it a tough read, but worthwhile.
If you're interested in this sort of stuff you might want to check out Knuth, too.

There exists several bottoms for abstraction chains. Depending on the underlying task computer scientist can choose one, that suits better. The most common are Turing Machine and Lambda Calculus.

Related

How small should functions be? [duplicate]

This question already has answers here:
When is a function too long? [closed]
(24 answers)
Closed 5 years ago.
How small should I be making functions? For example, if I have a cake baking program.
bakeCake(){
if(cakeType == "chocolate")
fetchIngredients("chocolate")
else
if(cakeType == "plain")
fetchIngredients("plain")
else
if(cakeType == "Red velvet")
fetchIngredients("Red Velvet")
//Rest of program
My question is, while this stuff is simple enough on its own, when I add much more stuff to the bakeCake function it becomes cluttered. But lets say that this program has to bake thousands of cakes per second. From what I've heard, it takes significantly longer (relative to computer time) to use another function compared to just doing the statements in the current function. So something that's similar like this should be very easy to read, and if efficiency is important wouldn't I want to keep it in there?
Basically, at what point do I sacrifice readability for efficiency. And a quick bonus question, at what point does having too many functions decrease readability? Here's an example of Apple's swift tutorial.
func isCandyAmountAcceptable(bandMemberCount: Int, candyCount: Int) -> Bool {
return candyCount % bandMemberCount == 0
They said that because the function name isCandyAmountAcceptable was easier to read than candyCount % bandMemberCount == 0 that it'd be good to make a function for that. But from my perspective it may take a few seconds to figure out what the second option is saying, but it's also more readable when ti comes to knowing how it works.
Sorry about being all over the place and kinda asking 2 questions in one. Just to summarize my questions:
Does using functions extraneously make efficiency (speed) suffer? If it does how can I figure out what the cutoff between readability and efficiency is?
How small and simple should I make functions for? Obviously I'd make them if I ever have to repeat the function, but what about one time use functions?
Thanks guys, sorry if these questions are ignorant or anything but I'd really appreciate an answer.
Does using functions extraneously make efficiency (speed) suffer? If
it does how can I figure out what the cutoff between readability and
efficiency is?
For performance I would generally not factor in any overhead of direct function calls against any decent optimizer, since it can even make those come free of charge. When it doesn't, it's still a negligible overhead in, say, 99.9% of scenarios. That applies even for performance-critical fields. I work in areas like raytracing, mesh processing, and image processing and still the cost of a function call is typically on the bottom of the priority list as opposed to, say, locality of reference, efficient data structures, parallelization, and vectorization. Even when you're micro-optimizing, there are much bigger priorities than the cost of a direct function call, and even when you're micro-optimizing, you often want to leave a lot of the optimization for your optimizer to perform instead of trying to fight against it and do it all by hand (unless you're actually writing assembly code).
Of course with some compilers you might deal with ones that never inline function calls and have a bit of an overhead to every function call. But in that case I'd still say it's relatively negligible since you probably shouldn't be worrying about such micro-level optimizations when using those languages and interpreters/compilers. Even then it will probably often be bottom on the priority list, relatively speaking, as opposed to more impactful things like improving locality of reference and thread efficiency.
It's like if you're using a compiler with very simplistic register allocation that has a stack spill for every single variable you use, that doesn't mean you should be trying to use and reuse as few variables as possible to work around its tendencies. It means reach for a new compiler in those cases where that's a non-negligible overhead (ex: write some C code into a dylib and use that for the most performance-critical parts), or focus on higher-level optimizations like making everything run in parallel.
How small and simple should I make functions for? Obviously I'd make
them if I ever have to repeat the function, but what about one time
use functions?
This is where I'm going to go slightly off-kilter and actually suggest you consider avoiding the teeniest of functions for maintainability reasons. This is admittedly a controversial opinion although at least John Carmack seems to somewhat agree (specifically in respect to inlining code and avoiding excess function calls for cases where side effects occur to make the side effects easier to comprehend).
However, if you are going to make a lot of state changes, having them
all happen inline does have advantages; you should be made constantly
aware of the full horror of what you are doing.
The reason I believe it can sometimes be good to err on the side of meatier functions is because there's often more to comprehend than that of a simple function to understand all the information necessary to make a change or fix a problem.
Which is simpler to comprehend, a function whose logic consists of 80 lines of inlined code, or one distributed across a couple dozen functions and possibly ones that lead to disparate places throughout the codebase?
The answer is not so clear cut. Naturally if the teeny functions are used widely, like say sqrt or abs, then the reader can simply skim over the function call, knowing full well what it does like the back of his hand. But if there are a lot of teeny exotic functions that are only used one time, then the ability to comprehend the operation as a whole requires looking them up and understanding what they all individually do before you can get a proper comprehension of what's going on in terms of the big picture.
I actually disagree with that Apple Swift tutorial somewhat with that one-liner function because while it is easier to understand than figuring out what the arithmetic and comparison are supposed to do, in exchange it might require looking it up to see what it does in scenarios where you can't just say, isCandyAmountAcceptable is enough information for me and need to figure out exactly what makes an amount acceptable. Instead I would actually prefer a simple comment:
// Determine if candy amount is acceptable.
if (candyCount % bandMemberCount == 0)
...
... because then you don't have to jump to disparate places in code (the analogy of a book referring its reader to other pages in the book causing the readers to constantly have to flip back and forth between pages) to figure that out. Of course the idea behind this isCandyAmountAcceptable kind of function is that you shouldn't have to be concerned with such details about what makes a candy amount of acceptable, but too often in practice, we do end up having to understand the details more often than we optimally should to debug the code or make changes to it. If the code never needs to be debugged or changed, then it doesn't really matter how it's written. It could even be written in binary code for all we care. But if it's written to be maintained, as in debugged and changed in the future, then sometimes it is helpful to avoid making the readers have to jump through lots of hoops. The details do often matter in those scenarios.
So sometimes it doesn't help to understand the big picture by fragmenting it into the teeniest of puzzle pieces. It's a balancing act, but certain types of developers can err on the side of overly dicing up their systems into the most granular bits and pieces and finding maintenance problems that way. Those types are still often promising engineers -- they just have to find their balance. The other extreme is the one that writes 500-line functions and doesn't even consider refactoring -- those are kinda hopeless. But I think you fit in the former category, and for you, I'd actually suggest erring on the side of meatier functions ever-so-slightly just to keep the puzzle pieces a healthy size (not too small, not too big).
There's even a balancing act I see between code duplication and minimizing dependencies. An image library doesn't necessarily become easier to comprehend by shaving off a few dozen lines of duplicated math code if the exchange is a dependency to a complex math library with 800,000 lines of code and an epic manual on how to use it. In such cases, the image library might very well be easier to comprehend as well as use and deploy in new projects if it chooses instead to duplicate a few math functions here and there to avoid external dependencies, isolating its complexity instead of distributing it elsewhere.
Basically, at what point do I sacrifice readability for efficiency.
As stated above, I don't think readability of the small picture and comprehensibility of the big picture are synonymous. It can be really easy to read a two-line function and know what it does and still be miles away from understanding what you need to understand to make the necessary changes. Having many of those teeny one-shot two-liners can even delay the ability to comprehend the big picture.
But if I use "comprehensibility vs. efficiency" instead, I'd say upfront at the design-level for cases where you anticipate processing huge inputs. As an example, a video processing application with custom filters knows it's going to be looping over millions of pixels many times per frame. That knowledge should be utilized to come up with an efficient design for looping over millions of pixels repeatedly. But that's with respect to design -- towards the central aspects of the system that many other places will depend upon because big central design changes are too costly to apply late in hindsight.
That doesn't mean it has to start applying hard-to-understand SIMD code right off the bat. That's an implementation detail provided the design leaves enough breathing room to explore such an optimization in hindsight. Such a design would imply abstracting at the Image level, at the level of a million+ pixels, not at the level of a single IPixel. That's the worthy thing to take into consideration upfront.
Then later on, you can optimize hotspots and potentially use some difficult-to-understand algorithms and micro-optimizations here and there for those truly critical cases where there's a strong perceived business need for the operation to go faster, and hopefully with good tools (profilers, i.e.) in hand. The user cases guide you about what operations to optimize based on what the users do most often and find a strong desire to spend less time waiting. The profiler guides you about precisely what parts of the code involved in that operation need to be optimized.
Readability, performance and maintainability are three different things. Readability will make your code look simple and understandable, not necessarily best way to go. Performance is always going to be important, unless you are running this code in non-production environment where end result is more important than how it was achieved. Enter the world of enterprise applications, maintainability suddenly gains lot more importance. What you work on today will be handed over to somebody else after 6 months and they will be fixing/changing your code. This is why suddenly standard design patterns become so important. In a way, the readability is part of maintainability on larger scale. If the cake baking program above is something more complex than what its looking like, first thing stands out as a code smell is existence if if-else. Its gotta get replaced with polymorphism. Same goes with switch case kind of construct.
At what point do you decide to sacrifice one for other? That purely depends upon what business your code is achieving. Is it academic? Its got to be the perfect solution even if it means 90% devs struggle to figure out at first glance what the hell is happening. Is it a website belonging to retail store being maintained by distributed team of 50 devs working from 2 or more different geographic locations? Follow the conventional design patterns.
A rule of thumb I have always seen being followed in almost all situations is that if a function is growing beyond half the screen, its a candidate for refactoring. Do you have functions that end up you having your editor long length scroll bars? Refactor!!!

Where to learn about low-level, hard-core performance stuffs?

This is actually a 2 part question:
For people who want to squeeze every clock cycle, people talk about pipelines, cache locality, etc.
I have seen these low level performance techniques mentioned here and there but I have not seen a good introduction to the subject, from start to finish. Any resource recommendations? (Google gave me definitions and papers, where I'd really appreciate some kind of worked examples/tutorials real-life hands-on kind of materials)
How does one actually measure this kind of things? Like, as in a profiler of some sort? I know we can always change the code, see the improvement and theorize in retrospect, I am just wondering if there are established tools for the job.
(I know algorithm optimization is where the orders of magnitudes are. I am interested in the metal here)
The chorus of replies is, "Don't optimize prematurely." As you mention, you will get a lot more performance out of a better design than a better loop, and your maintainers will appreciate it, as well.
That said, to answer your question:
Learn assembly. Lots and lots of assembly. Don't MUL by a power of two when you can shift. Learn the weird uses of xor to copy and clear registers. For specific references,
http://www.mark.masmcode.com/ and http://www.agner.org/optimize/
Yes, you need to time your code. On *nix, it can be as easy as time { commands ; } but you'll probably want to use a full-features profiler. GNU gprof is open source http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html
If this really is your thing, go for it, have fun, and remember, lots and lots of bit-level math. And your maintainers will hate you ;)
EDIT/REWRITE:
If it is books you need Michael Abrash did a good job in this area, Zen of Assembly language, a number of magazine articles, big black book of graphics programming, etc. Much of what he was tuning for is no longer a problem, the problems have changed. What you will get out of this is the ideas of the kinds of things that can cause bottle necks and the kinds of ways to solve. Most important is to time everything, and understand how your timing measurements work so that you are not fooling yourself by measuring incorrectly. Time the different solutions and try crazy, weird solutions, you may find an optimization that you were not aware of and didnt realize until you exposed it.
I have only just started reading but See MIPS Run (early/first edition) looks good so far (note that ARM took over MIPS as the leader in the processor market, so the MIPS and RISC hype is a bit dated). There are a number of text books old and new to be had about MIPS. Mips being designed for performance (At the cost of the software engineer in some ways).
The bottlenecks today fall into the categories of the processor itself and the I/O around it and what is connected to that I/O. The insides of the processor chips themselves (for higher end systems) run much faster than the I/O can handle, so you can only tune so far before you have to go off chip and wait forever. Getting off the train, from the train to your destination half a minute faster when the train ride was 3 hours is not necessarily a worthwhile optimization.
It is all about learning the hardware, you can probably stay within the ones and zeros world and not have to get into the actual electronics. But without really knowing the interfaces and internals you really cannot do much performance tuning. You might re-arrange or change a few instructions and get a little boost, but to make something several hundred times faster you need more than that. Learning a lot of different instruction sets (assembly languages) helps get into the processors. I would recommend simulating HDL, for example processors at opencores, to get a feel for how some folks do their designs and getting a solid handle on how to really squeeze clocks out of a task. Processor knowledge is big, memory interfaces are a huge deal and need to be learned, media (flash, hard disks, etc) and displays and graphics, networking, and all the types of interfaces between all of those things. And understanding at the clock level or as close to it as you can get, is what it takes.
Intel and AMD provide optimization manuals for x86 and x86-64.
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html/
http://developer.amd.com/documentation/guides/pages/default.aspx
Another excellent resource is agner.
http://www.agner.org/optimize/
Some of the key points (in no particular order):
Alignment; memory, loop/function labels/addresses
Cache; non-temporal hints, page and cache misses
Branches; branch prediction and avoiding branching with compare&move op-codes
Vectorization; using SSE and AVX instructions
Op-codes; avoiding slow running op-codes, taking advantage of op-code fusion
Throughput / pipeline; re-ordering or interleaving op-codes to perform separate tasks avoiding partial stales and saturating the processor's ALUs and FPUs
Loop unrolling; performing multiple iterations for a single "loop comparison, branch"
Synchronization; using atomic op-code (or LOCK prefix) to avoid high level synchronization constructs
Yes, measure, and yes, know all those techniques.
Experienced people will tell you "don't optimize prematurely", which I relate as simply "don't guess".
They will also say "use a profiler to find the bottleneck", but I have a problem with that. I hear lots of stories of people using profilers and either liking them a lot or being confused with their output.
SO is full of them.
What I don't hear a lot of is success stories, with speedup factors achieved.
The method I use is very simple, and I've tried to give lots of examples, including this case.
I'd suggest Optimizing subroutines in assembly
language
An optimization guide for x86 platforms.
It's quite heavy stuff though ;)

Computing Efficiency Question

I'm wondering about computational efficiency. I'm going to use Java in this example, but it is a general computing question. Lets say I have a string and I want to get the value of the first letter of the string, as a string. So I can do
String firstletter = String.valueOf(somestring.toCharArray()[0]);
Or I could do:
char[] stringaschar = somestring.toCharArray();
char firstchar = stringaschar[0];
String firstletter = String.valueOf(firstchar);
My question is, are the two ways essentially the same, computationally? I mean, the second way I explicitly had to create 2 intermediate variables, to be stored in memory (the stack?) temporarily.
But the first way, too, the computer will have to still create the same variables, implicitly, right? And the number of operations doesn't change. My thinking is, the two ways are the same. But I'd like to know for sure.
In most cases the two ways should produce the same, or nearly the same, object code. Optimizing compilers usually detect that the intermediate variables in the second option are not necessary to get the correct result, and will collapse the call graph accordingly.
This all depends on how your Java interpreter decides to translate your code into an intermediary language for runtime execution. It may actually have optimizations which translate the two approaches to be the same exact byte code.
The two should be essentially the same. In both cases you make the same calls converting the string to an array, finding the first character, and getting the value of the character. There may be minor differences in how the compiler handles these, but they should be insignficant.
The earlier answers are coincident and right, AFAIK.
However, I think there are a few additional and general considerations you should be aware of each time you wonder about the efficiency of any computational asset (code, for example).
First, if everything is under your strict control you could in principle count clock cycles one by one from assembly code. Or from some more abstract reasoning find the computational cost of an operation/algorithm.
So far so good. But don't forget to measure afterwards. You may find that measuring execution times is not so easy and straightforward, and sometimes is elusive (How to account for interrupts, for I/O wait, for network bottlenecks ...). But it pays. You ask here for counsel, but YOUR Compiler/Interpreter/P-code generator/Whatever could be set with just THAT switch in the third layer of your config scripts.
The other consideration, more to your current point is the existence of Black Boxes. You are not alone in the world and a Black Box is any piece used to run your code, which is essentially out of your control. Compilers, Operating Systems, Networks, Storage Systems, and the World in general fall into this category.
What we do with Black Boxes (they are black, either because their code is not public or because we just happen to use our free time fishing instead of digging library source code) is establishing mental models to help us understand how they work. (BTW, This is an extraordinary book about how we humans forge our mental models). But you should always beware that they are models, not the real thing. Models help us to explain things ... to a certain extent. Classical Mechanics reigned until Relativity and Quantum Mechanics fluorished. None of them is wrong They have limits, and so have all our models.
Even if you happen to be friend with your router OS, or your Linux kernel, when confronting an efficiency problem, design a good experiment and measure.
HTH!
NB: By design a good experiment I mean beware of the tar pits. Examples: measuring your measurement code instead the target of the experiment, being influenced by external factors, forget external factors that will influence the production code, test with data whose cardinality, orthogonality, or whatever-ality is dissimilar with the "real world", mapping wrongly the production and testing Client/server workhorses, et c, et c, et c.
So go, and meassure your code. Your results will be the most interesting thing in this page.

Optimization! - What is it? How is it done?

Its common to hear about "highly optimized code" or some developer needing to optimize theirs and whatnot. However, as a self-taught, new programmer I've never really understood what exactly do people mean when talking about such things.
Care to explain the general idea of it? Also, recommend some reading materials and really whatever you feel like saying on the matter. Feel free to rant and preach.
Optimize is a term we use lazily to mean "make something better in a certain way". We rarely "optimize" something - more, we just improve it until it meets our expectations.
Optimizations are changes we make in the hopes to optimize some part of the program. A fully optimized program usually means that the developer threw readability out the window and has recoded the algorithm in non-obvious ways to minimize "wall time". (It's not a requirement that "optimized code" be hard to read, it's just a trend.)
One can optimize for:
Memory consumption - Make a program or algorithm's runtime size smaller.
CPU consumption - Make the algorithm computationally less intensive.
Wall time - Do whatever it takes to make something faster
Readability - Instead of making your app better for the computer, you can make it easier for humans to read it.
Some common (and overly generalized) techniques to optimize code include:
Change the algorithm to improve performance characteristics. If you have an algorithm that takes O(n^2) time or space, try to replace that algorithm with one that takes O(n * log n).
To relieve memory consumption, go through the code and look for wasted memory. For example, if you have a string intensive app you can switch to using Substring References (where a reference contains a pointer to the string, plus indices to define its bounds) instead of allocating and copying memory from the original string.
To relieve CPU consumption, cache as many intermediate results if you can. For example, if you need to calculate the standard deviation of a set of data, save that single numerical result instead looping through the set each time you need to know the std dev.
I'll mostly rant with no practical advice.
Measure First. Optimization should be done to places where it matters. Highly optimized code is often difficult to maintain and a source of problems. In places where the code does not slow down execution anyway, I alwasy prefer maintainability to optimizations. Familiarize yourself with Profiling, both intrusive (instrumented) and non-intrusive (low overhead statistical). Learn to read a profiled stack, understand where the time inclusive/time exclusive is spent, why certain patterns show up and how to identify the trouble spots.
You can't fix what you cannot measure. Have your program report through some performance infrastructure the thing it does and the times it takes. I come from a Win32 background so I'm used to the Performance Counters and I'm extremely generous at sprinkling them all over my code. I even automatized the code to generate them.
And finally some words about optimizations. Most discussion about optimization I see focus on stuff any compiler will optimize for you for free. In my experience the greatest source of gains for 'highly optimized code' lies completely elsewhere: memory access. On modern architectures the CPU is idling most of the times, waiting for memory to be served into its pipelines. Between L1 and L2 cache misses, TLB misses, NUMA cross-node access and even GPF that must fetch the page from disk, the memory access pattern of a modern application is the single most important optimization one can make. I'm exaggerating slightly, of course there will be counter example work-loads that will not benefit memory access locality this techniques. But most application will. To be specific, what these techniques mean is simple: cluster your data in memory so that a single CPU can work an a tight memory range containing all it needs, no expensive referencing of memory outside your cache lines or your current page. In practice this can mean something as simple as accessing an array by rows rather than by columns.
I would recommend you read up the Alpha-Sort paper presented at the VLDB conference in 1995. This paper presented how cache sensitive algorithms designed specifically for modern CPU architectures can blow out of the water the old previous benchmarks:
We argue that modern architectures
require algorithm designers to
re-examine their use of the memory
hierarchy. AlphaSort uses clustered
data structures to get good cache
locality...
The general idea is that when you create your source tree in the compilation phase, before generating the code by parsing it, you do an additional step (optimization) where, based on certain heuristics, you collapse branches together, delete branches that aren't used or add extra nodes for temporary variables that are used multiple times.
Think of stuff like this piece of code:
a=(b+c)*3-(b+c)
which gets translated into
-
* +
+ 3 b c
b c
To a parser it would be obvious that the + node with its 2 descendants are identical, so they would be merged into a temp variable, t, and the tree would be rewritten:
-
* t
t 3
Now an even better parser would see that since t is an integer, the tree could be further simplified to:
*
t 2
and the intermediary code that you'd run your code generation step on would finally be
int t=b+c;
a=t*2;
with t marked as a register variable, which is exactly what would be written for assembly.
One final note: you can optimize for more than just run time speed. You can also optimize for memory consumption, which is the opposite. Where unrolling loops and creating temporary copies would help speed up your code, they would also use more memory, so it's a trade off on what your goal is.
Here is an example of some optimization (fixing a poorly made decision) that I did recently. Its very basic, but I hope it illustrates that good gains can be made even from simple changes, and that 'optimization' isn't magic, its just about making the best decisions to accomplish the task at hand.
In an application I was working on there were several LinkedList data structures that were being used to hold various instances of foo.
When the application was in use it was very frequently checking to see if the LinkedListed contained object X. As the ammount of X's started to grow, I noticed that the application was performing more slowly than it should have been.
I ran an profiler, and realized that each 'myList.Contains(x)' call had O(N) because the list has to iterate through each item it contains until it reaches the end or finds a match. This was definitely not efficent.
So what did I do to optimize this code? I switched most of the LinkedList datastructures to HashSets, which can do a '.Contains(X)' call in O(1)- much better.
This is a good question.
Usually the best practice is 1) just write the code to do what you need it to do, 2) then deal with performance, but only if it's an issue. If the program is "fast enough" it's not an issue.
If the program is not fast enough (like it makes you wait) then try some performance tuning. Performance tuning is not like programming. In programming, you think first and then do something. In performance tuning, thinking first is a mistake, because that is guessing.
Don't guess what to fix; diagnose what the program is doing.
Everybody knows that, but mostly they do it anyway.
It is natural to say "Could be the problem is X, Y, or Z" but only the novice acts on guesses. The pro says "but I'm probably wrong".
There are different ways to diagnose performance problems.
The simplest is just to single-step through the program at the assembly-language level, and don't take any shortcuts. That way, if the program is doing unnecessary things, then you are doing the same things, and it will become painfully obvious.
Another is to get a profiling tool, and as others say, measure, measure, measure.
Personally I don't care for measuring. I think it's a fuzzy microscope for the purpose of pinpointing performance problems. I prefer this method, and this is an example of its use.
Good luck.
ADDED: I think you will find, if you go through this exercise a few times, you will learn what coding practices tend to result in performance problems, and you will instinctively avoid them. (This is subtly different from "premature optimization", which is assuming at the beginning that you must be concerned about performance. In fact, you will probably learn, if you don't already know, that premature concern about performance can well cause the very problem it seeks to avoid.)
Optimizing a program means: make it run faster
The only way of making the program faster is making it do less:
find an algorithm that uses fewer operations (e.g. N log N instead of N^2)
avoid slow components of your machine (keep objects in cache instead of in main memory, or in main memory instead of on disk); reducing memory consumption nearly always helps!
Further rules:
In looking for optimization opportunities, adhere to the 80-20-rule: 20% of typical program code accounts for 80% of execution time.
Measure the time before and after every attempted optimization; often enough, optimizations don't.
Only optimize after the program runs correctly!
Also, there are ways to make a program appear to be faster:
separate GUI event processing from back-end tasks; priorize user-visible changes against back-end calculation to keep the front-end "snappy"
give the user something to read while performing long operations (every noticed the slideshows displayed by installers?)
However, as a self-taught, new programmer I've never really understood what exactly do people mean when talking about such things.
Let me share a secret with you: nobody does. There are certain areas where we know mathematically what is and isn't slow. But for the most part, performance is too complicated to be able to understand. If you speed up one part of your code, there's a good possibility you're slowing down another.
Therefore, anyone who tells you that one method is faster than another, there's a good possibility they're just guessing unless one of three things are true:
They have data
They're choosing an algorithm that they know is faster mathematically.
They're choosing a data structure that they know is faster mathematically.
Optimization means trying to improve computer programs for such things as speed. The question is very broad, because optimization can involve compilers improving programs for speed, or human beings doing the same.
I suggest you read a bit of theory first (from books, or Google for lecture slides):
Data structures and algorithms - what the O() notation is, how to calculate it,
what datastructures and algorithms can be used to lower the O-complexity
Book: Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest
Compilers and assembly - how code is translated to machine instructions
Computer architecture - how the CPU, RAM, Cache, Branch predictions, out of order execution ... work
Operating systems - kernel mode, user mode, scheduling processes/threads, mutexes, semaphores, message queues
After reading a bit of each, you should have a basic grasp of all the different aspects of optimization.
Note: I wiki-ed this so people can add book recommendations.
I am going with the idea that optimizing a code is to get the same results in less time. And fully optimized only means they ran out of ideas to make it faster. I throw large buckets of scorn on claims of "fully optimized" code! There's no such thing.
So you want to make your application/program/module run faster? First thing to do (as mentioned earlier) is measure also known as profiling. Do not guess where to optimize. You are not that smart and you will be wrong. My guesses are wrong all the time and large portions of my year are spent profiling and optimizing. So get the computer to do it for you. For PC VTune is a great profiler. I think VS2008 has a built in profiler, but I haven't looked into it. Otherwise measure functions and large pieces of code with performance counters. You'll find sample code for using performance counters on MSDN.
So where are your cycles going? You are probably waiting for data coming from main memory. Go read up on L1 & L2 caches. Understanding how the cache works is half the battle. Hint: Use tight, compact structures that will fit more into a cache-line.
Optimization is lots of fun. And it's never ending too :)
A great book on optimization is Write Great Code: Understanding the Machine by Randall Hyde.
Make sure your application produces correct results before you start optimizing it.

Digital Circuit understanding

In my quest for getting some basics down before I start going into programming I am looking for essential knowledge about how the computer works down at the core level.
I have a theory that actually understanding what for instance a stackoverflow let alone a stack is, instead of my sporadic knowledge about computer systems, will help me longer term.
Is there any books or sites that take you through how processors are structured and give a holistic overview and that somehow relates to good to know about digital logic?
Am i making sense?
Yes, you should read some topics of
John L. Hennessy & David A. Patterson, "Computer Architecture: A quantitative Approach"
It has microprocessors' history and theory , (starting with RISC archs - MIPS), pipelining, memory, storage, etc.
David Patterson is a Professor of Computer of Computer Science on EECS Department - U. Berkeley. http://www.eecs.berkeley.edu/~pattrsn/
Hope it helps, here's the link
Tanenbaum's Structured Computer Organization is a good book about how computers work. You might find it hard to get through the book, but that's mostly due to the subject, not the author.
However, I'm not sure I would recommend taking this approach. Understanding how the computer works can certainly be useful, but if you don't really have any programming knowledge, you can't really put your knowledge to good use - and you probably don't need that knowledge yet anyway. You would be better off learning about topics like object-oriented programming and data structures to learn about program design, because unless you're looking at doing embedded programming on very limited systems, you'll find those skills far more useful than knowledge of a computer's inner workings.
In my opinion, 20 years ago it was possible to understand the whole spectrum from BASIC all the way through operating system, hardware, down to the transistor or even quantum level. I don't know that it's possible for one person to understand that whole spectrum with today's technology. (Years ago, everyone serviced their own car. Today it's too hard.)
Some of the "layers" that you might be interested in:
http://en.wikipedia.org/wiki/Boolean_logic (this will be helpful for programming)
http://en.wikipedia.org/wiki/Flip-flop_%28electronics%29
http://en.wikipedia.org/wiki/Finite-state_machine
http://en.wikipedia.org/wiki/Static_random_access_memory
http://en.wikipedia.org/wiki/Bus_%28computing%29
http://en.wikipedia.org/wiki/Microprocessor
http://en.wikipedia.org/wiki/Computer_architecture
It's pretty simple really - the cpu loads instructions and executes them, most of those instructions revolve around loading values into registers or memory locations, and then manipulating those values. Certain memory ranges are set aside for communicating with the peripherals that are attached to the machine, such as the screen or hard drive.
Back in the days of Apple ][ and Commodore 64 you could put a value directly in to a memory location and that would directly change a pixel on the screen - those days are long gone, it is abstracted away from you (the programmer) by several layers of code, such as drivers and the operating system.
You can learn about this sort of stuff, or assembly language (which i am a huge fan of), or AND/NAND gates at the hardware level, but knowing this sort of stuff is not going to help you code up a web application in ASP.NET MVC, or write a quick and dirty Python or Powershell script.
There are lots of resources out there sprinkled around the net that will give you insight into how the CPU and the rest of the hardware works, but if you want to get down and dirty i honestly think you should buy one of those older machines off eBay or somewhere, and learn its particular flavour of assembly language (i understand there are also a lot of programmable PIC controllers out there that might also be good to learn on). Picking up an older machine is going to eliminate the software abstractions and make things way easier to learn. You learn way better when you get instant gratification, like making sprites move around a screen or generating sounds by directly toggling the speaker (or using a PIC controller to control a small robot). With those older machines, the schematics for an Apple ][ motherboard fit on to a roughly A2 size sheet of paper that was folded into the back of one of the Apple manuals - i would hate to imagine what they look like these days.
While I agree with the previous answers insofar as it is incredibly difficult to understand the entire process, we can at least break it down into categories, from lowest (closest to electrons) to highest (closest to what you actually see).
Lowest
Solid State Device Physics (How transistors work physically)
Circuit Theory (How transistors are combined to create logic gates)
Digital Logic (How logic gates are put together to create digital functions or digital structures i.e. multiplexers, full adders, etc.)
Hardware Organization (How the data path is laid out in the CPU, the components of a Von Neuman machine -> memory, processor, Arithmetic Logic Unit, fetch/decode/execute)
Microinstructions (Bit level programming)
Assembly (Programming with words, but directly specifying registers and takes forever to program even simple things)
Interpreted/Compiled Languages (Programming languages that get compiled or interpreted to assembly; the operating system may be in one of these)
Operating System (Process scheduling, hardware interfaces, abstracts lower levels)
Higher level languages (these kind of appear twice; it depends on the language. Java is done at a very high level, but C goes straight to assembly, and the C compiler is probably written in C)
User Interfaces/Applications/Gui (Last step, making it look pretty)
You can find out a lot about each of these. I'm only somewhat expert in the digital logic side of things. If you want a thorough tutorial on digital logic from the ground up, go to the electrical engineering menu of my website:
affablyevil.wordpress.com
I'm teaching the class, and adding online lessons as I go.

Resources