Are there any resources for language independent performance tips? - performance

I work with many people that program video games for a living. I have a quite a bit of knowledge in C++ and I know a number of general performance strategies to utilize in day to day programming. Like using prefix ++/-- over post fix.
My problem is that often times people come to me to give them tips on general optimizations they can do on a regular basis when programming, but often times these people program in all sorts of languages. Some use C++, C#, Java, ActionScript, etc.
I am wondering if there are any general performance tips that can be utilized on a day by day programming basis? For example, I would suggest prefix ++/-- over postfix for people programming in another language, but I am just not sure if that is true.
My guess is that it is language specific and the best way to go about general optimizations is to make sure you are not using majorly bloated algorithms, but maybe someone has some advice.

Without going into language specifics, or even knowing whether this is embedded, web, CAD, game, or iPhone programming, there isn't much that can be said. All we know is that there's multiple languages involved, and for some unknown reason performance is always slower than desirable.
First, check your algorithms. A slow algorithm can cause horrible performance. Read up on algorithms and their complexity.
Second, note if there are any really slow operations, such as hitting a database or transmitting information or moving a robot arm. See if the program is doing more of those than it should.
Third, profile. If there's a section of code that's taking 5% of the time, no optimization will make your program more than 5% faster. If a section of code is taking a lot of the time, it's worth looking at.
Fourth, get somebody who knows what they're doing to make any specific optimizations. Test them when they're done to make sure they actually speed up performance. When performance was an issue, I've improved it with some counterintuitive measures, like rolling up loops.

I don't think you can generalize optimization as such. To optimize execution time, you need to dig deep into the language and understand how things work in detail. Just guessing or making assumptions on experiences with other languages won't work! For example, writing x = x << 1 instead of x = x*2 might be a big benefit in C++. In JavaScript it will slow you down.
With all the differences between all the languages it's hard to find generic optimization tips. Maybe for some languages which are similar (f.ex. C# and Java). But if you add both JavaScript and Python to that list I'm pretty sure not many common optimization techniques will be left over.
Also keep in mind that premature optimization is often considered bad practice. Developer-hours are much more expensive than buying additional hardware.
However, there is one thing which comes to mind. Over the past decade or so, Object Relational Mappers have become quite popular. And hence, they emerge(d) in pretty much all popular languages. But you have to be careful with those. It's easy to load tons of data into memory that you will never use in your code if not properly configured. Keep that in mind. Lazy loading might be of some help here. But your mileage will vary.
Optimization depends on so many things that answering such a generic question would make this post explode into a full-fledged paper. In my opinion, optimization should be regarded on a project-by-project basis. Not only Language-by-Language basis.

I think you need to split this into two separate questions:
1) Are there language-agnostic ways to find performance problems? YES. Profile, but avoid the myths around that subject.
2) Are there language-agnostic ways to fix performance problems? IT DEPENDS.
A general language-agnostic principle is: do (1) before you do (2).
In other words, Ready-Aim-Fire, not Ready-Fire-Aim.
Here's an example of performance tuning, in C, but it could be any language.

A few things I have learned since asking this:
I/O operations are usually the most expensive to performance. This holds especially true when you are doing disk or network I/O (which is usually the most expensive because if you have to wait for a response from the other host you have to wait for all processing and I/O operations the remote host does). Only do these operations when absolutely necessary and possibly consider using a cache when possible.
Database operations can be very expensive because of network/disk I/O and the translation time to and from SQL. Using in-memory DB or cache can help reduce I/O issues and some (not all) NoSQL databases can reduce SQL translation time.
Only log important information. Using logging libraries like log4j can help because you can put logging to your hearts desire in your application but you set each message to a certain log level. Whichever log level you set the application to it will only log messages at that level or higher. This way if you need to troubleshoot functionality you only have to change a quick config and restart you application to give you additional messages. Then when you are done just turn you application back to the default level so that you do not log too often.
Only include functionality that is needed. Additional functionality may be nice to have but can increase processing time, provide additional locations for the application to fail, and costs your team development time that could be spent on more important tasks.
Use and configure your memory manager correctly. Garbage collection routines can kill performance if they are not configured correctly. If every minute you application freezes for a second or two for garbage collection your customer probably will not be happy.
Profile only after you have discovered a performance issue. Profilers will make the applications performance look worse than it is because you have your application and the profiler running on the same host, consuming the same hardware resources.
Do not prematurely do performance tuning. There are general practices you can take that should be better on performance in each language, but starting performance tuning in the middle of application development can cost you a lot on development because there is still functionality to be added.
This is not necessarily going to help performance but keep class dependency to a minimal. When you get into performance tuning there is good chance you will have to rewrite whole portions of code, which if there is a lot of dependencies on the section you are performance tuning the greater chance you will break the code. It can often be a domino affect because after fixing the performance issue than you have to fix all the dependencies, and possibly dependencies of the original dependencies. A performance tuning exercise estimate for a few hours can quickly turn into months with an application that has a lot of dependencies.
If performance is a concern do not use interpreted languages (scripting languages).
Only use the hardware you need. Having a system with a 64 core processor may seem cool but if you only have two or three threads running in your application than you are getting little benefit from having 64 cores. In fact, in rare instances having overly excessive hardware can sometimes hurt performance because the chips have to be wired to handle all the hardware which can cause your application to spend more time switching between cores or processors than actually being processed.
Any timing metrics you report make as granular as possible. Currently, you may only need to be worried about the number of milliseconds a process takes but in the future as you make your application faster and faster you may need more granular timings. If version A uses milliseconds and version B uses microseconds, how can you compare performance if version B is taking about the same number of milliseconds. Version B may be better but you just can't tell because version A did not use granular enough metrics.

Related

Where to learn about low-level, hard-core performance stuffs?

This is actually a 2 part question:
For people who want to squeeze every clock cycle, people talk about pipelines, cache locality, etc.
I have seen these low level performance techniques mentioned here and there but I have not seen a good introduction to the subject, from start to finish. Any resource recommendations? (Google gave me definitions and papers, where I'd really appreciate some kind of worked examples/tutorials real-life hands-on kind of materials)
How does one actually measure this kind of things? Like, as in a profiler of some sort? I know we can always change the code, see the improvement and theorize in retrospect, I am just wondering if there are established tools for the job.
(I know algorithm optimization is where the orders of magnitudes are. I am interested in the metal here)
The chorus of replies is, "Don't optimize prematurely." As you mention, you will get a lot more performance out of a better design than a better loop, and your maintainers will appreciate it, as well.
That said, to answer your question:
Learn assembly. Lots and lots of assembly. Don't MUL by a power of two when you can shift. Learn the weird uses of xor to copy and clear registers. For specific references,
http://www.mark.masmcode.com/ and http://www.agner.org/optimize/
Yes, you need to time your code. On *nix, it can be as easy as time { commands ; } but you'll probably want to use a full-features profiler. GNU gprof is open source http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html
If this really is your thing, go for it, have fun, and remember, lots and lots of bit-level math. And your maintainers will hate you ;)
EDIT/REWRITE:
If it is books you need Michael Abrash did a good job in this area, Zen of Assembly language, a number of magazine articles, big black book of graphics programming, etc. Much of what he was tuning for is no longer a problem, the problems have changed. What you will get out of this is the ideas of the kinds of things that can cause bottle necks and the kinds of ways to solve. Most important is to time everything, and understand how your timing measurements work so that you are not fooling yourself by measuring incorrectly. Time the different solutions and try crazy, weird solutions, you may find an optimization that you were not aware of and didnt realize until you exposed it.
I have only just started reading but See MIPS Run (early/first edition) looks good so far (note that ARM took over MIPS as the leader in the processor market, so the MIPS and RISC hype is a bit dated). There are a number of text books old and new to be had about MIPS. Mips being designed for performance (At the cost of the software engineer in some ways).
The bottlenecks today fall into the categories of the processor itself and the I/O around it and what is connected to that I/O. The insides of the processor chips themselves (for higher end systems) run much faster than the I/O can handle, so you can only tune so far before you have to go off chip and wait forever. Getting off the train, from the train to your destination half a minute faster when the train ride was 3 hours is not necessarily a worthwhile optimization.
It is all about learning the hardware, you can probably stay within the ones and zeros world and not have to get into the actual electronics. But without really knowing the interfaces and internals you really cannot do much performance tuning. You might re-arrange or change a few instructions and get a little boost, but to make something several hundred times faster you need more than that. Learning a lot of different instruction sets (assembly languages) helps get into the processors. I would recommend simulating HDL, for example processors at opencores, to get a feel for how some folks do their designs and getting a solid handle on how to really squeeze clocks out of a task. Processor knowledge is big, memory interfaces are a huge deal and need to be learned, media (flash, hard disks, etc) and displays and graphics, networking, and all the types of interfaces between all of those things. And understanding at the clock level or as close to it as you can get, is what it takes.
Intel and AMD provide optimization manuals for x86 and x86-64.
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html/
http://developer.amd.com/documentation/guides/pages/default.aspx
Another excellent resource is agner.
http://www.agner.org/optimize/
Some of the key points (in no particular order):
Alignment; memory, loop/function labels/addresses
Cache; non-temporal hints, page and cache misses
Branches; branch prediction and avoiding branching with compare&move op-codes
Vectorization; using SSE and AVX instructions
Op-codes; avoiding slow running op-codes, taking advantage of op-code fusion
Throughput / pipeline; re-ordering or interleaving op-codes to perform separate tasks avoiding partial stales and saturating the processor's ALUs and FPUs
Loop unrolling; performing multiple iterations for a single "loop comparison, branch"
Synchronization; using atomic op-code (or LOCK prefix) to avoid high level synchronization constructs
Yes, measure, and yes, know all those techniques.
Experienced people will tell you "don't optimize prematurely", which I relate as simply "don't guess".
They will also say "use a profiler to find the bottleneck", but I have a problem with that. I hear lots of stories of people using profilers and either liking them a lot or being confused with their output.
SO is full of them.
What I don't hear a lot of is success stories, with speedup factors achieved.
The method I use is very simple, and I've tried to give lots of examples, including this case.
I'd suggest Optimizing subroutines in assembly
language
An optimization guide for x86 platforms.
It's quite heavy stuff though ;)

Is Performance Always Important?

Since I am a Lone Developer, I have to think about every aspect of the systems I am working on. Lately I've been thinking about performance of my two websites, and ways to improve it. Sites like StackOverflow proclaim, "performance is a feature." However, "premature optimization is the root of all evil," and none of my customers have complained yet about the sites' performance.
My question is, is performance always important? Should performance always be a feature?
Note: I don't think this question is the same as this one, as that poster is asking when to consider performance and I am asking if the answer to that question is always, and if so, why. I also don't think this question should be CW, as I believe there is an answer and reasoning for that answer.
Adequate performance is always important.
Absolute fastest possible performance is almost never important.
It's always worth keeping an eye on performance and being aware of anything outrageously non-optimal that you're doing (particularly at a design/architecture level) but that's not the same as micro-optimising every line of code.
Performance != Optimization.
Performance is a feature indeed, but premature optimization will cost you time and will not yield the same result as when you optimize the parts that need optimization. And you can't really know which parts need optimization until you can actually profile something.
Performance is the feature that your clients will not tell you about if it's missing, unless it's really painfully slow and they're forced to use your product. Existing customers may report it in the end, but new customers will simply not bother if the performance is required.
You need to know what performance you need, and formulate it as a requirement. Then, you have to meet your own requirement.
That 'root of all evil' quote is almost always misused and misunderstood.
Designing your application to perform well can be mostly be done with just good design. Good design != premature optimization, and it's utterly ridiculous to go off writing crap code and blowing off doing a better job on the design as an 'evil' waste. Now, I'm not specifically talking about you here... but I see people do this a lot.
It usually saves you time to do a good job on the design. If you emphasize that, you'll get better at it... and get faster and faster at writing systems that perform well from the start.
Understanding what kinds of structures and access methods work best in certain situations is key here.
Sure, if you're app becomes truly massive or has insane speed requirements you may find yourself doing tricked out optimizations that make your code uglier or harder to maintain... and it would be wrong to do those things before you need to.
But that is absolutely NOT the same thing as making an effort to understand and use the right algorithms or data patterns or whatever in the first place.
Your users are probably not going to complain about bad performance if it's bearable. They possibly wouldn't even know it could be faster. Reacting to complaints as a primary driver is a bad way to operate. Sure, you need to address complaints you receive... but a lack of them does not mean there isn't a problem. The fact that you are considering improving performance is a bit of an indicator right there. Was it just a whim, or is some part of you telling you it should be better? Why did you consider improving it?
Just don't go crazy doing unnecessary stuff.
Keep performance in mind but given your situation it would be unwise to spend too much time up front on it.
Performance is important but it's often hard to know where your bottleneck will be. Therefore I'd suggest planning to dedicate some time to this feature once you've got something to work with.
Thus you need to set up metrics that are important to your clients and you. Keep and analyse these measurements. Then estimate how long and how much each would take to implement. Now you can aim on getting as much bang for you buck/time.
If it's web it would be wise to note your page size and performance using Firebug + yslow and/or google page speed. Again, know what applies to a small site like yours and things that only apply to yahoo and google.
Jackson’s Rules of Optimization:
Rule 1. Don’t do it.
Rule 2 (for experts only). Don’t do it
yet— that is, not until you have a
perfectly clear and unoptimized
solution.
—M. A. Jackson
Extracted from Code Complete 2nd edition.
To give a generalized answer to a general question:
First make it work, then make it right, then make it fast.
http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast
This puts a more constructive perspective on "premature optimization is the root of all evil".
So to parallel Jon Skeet's answer, adequate performance (as part of making something work, and making it right) is always important. Even then it can often be addressed after other functionality.
Jon Skeets 'adequate' nails it, with the additional provision that for a library you don't know yet what's adequate, so it's better to err on the safe side.
It is one of the many stakes you must not get wrong, but the quality of your app is largely determined by the weakest link.
Performance is definitely always important in a certain sense - maybe not the one you mean: namely in all phases of development.
In Big O notation, what's inside the parantheses is largely decided by design - both components isolation and data storage. Choice of algorithm will usually only best/worst case behavior (unless you start with decidedly substandard algorithms). Code optimizations will mostly affect the constant factor - which shouldn't be neglected, either.
But that's true for all aspects of code: in any stage, you have a good chance to fail any aspect - stability, maintainability, compatibility etc. Performance needs to be balanced, so that no aspect is left behind.
In most applications 90% or more of execution time is spend in 10% or less of the code. Usually there is little use in optimizing other code than these 10%.
performance is only important to the extent that developing the performance improvement takes less time than the total amount of time that will be saved for the user(s).
the result is that if you're developing something for millions... yeah it's important to save them time. if you're coding up a tool for your own use... it might be more trouble than it's worth to save a minute or even an hour or more.
(this is clearly not a rule set in stone... there are times when performance is truly critical no matter how much development time it takes)
There should be a balance to everything. Cost (or time to develop) vs Performance for instance. More performance = more cost. If a requirement of the system being built is high performance then the cost should not matter, but if cost is a factor then you optimize within reason. After a while, your return on investment suffers in that more performance does not bring in more returns.
The importance of performance is IMHO highly correlated to your problem set. If you are creating a site with an expectation of a heavy load and lot of server side processing, then you might want to put some more time into performance (otherwise your site might end up being unusable). However, for most applications the the time put into optimizing your perfomance on a website is not going to pay off - users won't notice the difference.
So I guess it breaks down to this:
Will users notice the improvements?
How does this improvement compare to competing sites?
If users will notice AND the improvement would be enough to differentiate you from the competition - performance is an important feature - otherwise not so much. (To a point - I don't recommend ignoring it entirely - you don't want your site to turtle along after all).
No. Fast enough is generally good enough.
It's not necessarily true, however, that your client's ideas about "fast enough" should trump your own. If you think it's fast enough and your client doesn't then yes, you need to accommodate your ideas to theirs. But if you're client thinks it's fast enough and you don't you should seriously consider going with your opinion, no theirs (since you may be more knowledgeable about performance standards in the wider world).
How important performance is depends largely and foremost on what you do.
For example, if you write a library that can be used in any environment, this can hardly ever have too much performance. In some environments, a 10% performance advantage can be a major feature for a library.
If you, OTOH, write an application, there's always a point where it is fast enough. Users won't neither realize nor care whether a button pressed reacts within 0.05 or 0.2 seconds - even though that's a factor of 4.
However, it is always easier to get working code faster, than it is to get fast code working.
No. Performance is not important.
Lack of performance is important.
Performance is something to be designed in from the outset, not tacked on at the end. For the past 15 years I have been working in the performance engineering space and the cause of most project failures that I work on is a lack of requirements on performance. A couple of posts have noted "fast enough" as an observation and whether your expectation matches that of your clients, but what about when you have a situation of your client, your architectural team, your platform engineering team, your functional test team, your performance test team and your operations team all have different expectations on performance, none of which have been committed to stone and measured against. Bad Magic to be certain.
Capture those expectations on the part of your clients. Commit them to a specific, objective, measurable requirement that you can evaluate at each stage of production of your software. Expectations may not be uniform, with one section of your app/code needing to be faster than others, nor will each customer have the same expectations on what is considered acceptable. Having this information will force you to confront decisions in the design and implementation that you may have overlooked in the past and it will result in a product which is a better match to your clients expectations.

Is there a relation between static code analysis and application performance

My Question:
Performance tests are generally done after an application is integrated with various modules and ready for deploy.
Is there any way to identify performance bottlenecks during the development phase. Does code analysis throw any hints # performance?
It all depends on rules that you run during code analysis but I don't think that you can prevent performance bottlenecks just by CA.
From my expired it looks that performance problems are usually quite complicated and to find real problems you have to run performance tests.
No, except in very minor cases (eg for Java, use StringBuilder in a loop rather than string appends).
The reason is that you won't know how a particular piece of code will affect the application as a whole, until you're running the whole application with relevant dataset.
For example: changing bubblesort to quicksort wouldn't significantly affect your application if you're consistently sorting lists of a half-dozen elements. Or if you're running the sort once, in the middle of the night, and it doesn't delay other processing.
If we are talking .NET, then yes and no... FxCop (or built-in code analysis) has a number of rules in it that deal with performance concerns. However, this list is fairly short and limited in nature.
Having said that, there is no reason that FxCop could not be extended with a lot more rules (heuristic or otherwise) that catch potential problem areas and flag them. It's simply a fact that nobody (that I know of) has put significant work into this (yet).
Generally, no, although from experience I can look at a system I've never seen before and recognize some design approaches that are prone to performance problems:
How big is it, in terms of lines of code, or number of classes? This correlates strongly with performance problems caused by over-design.
How many layers of abstraction are there? Each layer is a chance to spend more cycles than necessary, and this effect compounds, especially if each operation is perceived as being "pretty efficient".
Are there separate data structures that need to be kept in agreement? If so, how is this done? If there is an attempt, through notifications, to keep the data structures tightly in sync, that is a red flag.
Of the categories of input information to the system, does some of it change at low frequency? If so, chances are it should be "compiled" rather than "interpreted". This can be a huge win both in performance and ease of development.
A common motif is this: Programmer A creates functions that wrap complex operations, like DB access to collect a good chunk of information. Programmer A considers this very useful to other programmers, and expects these functions to be used with a certain respect, not casually. Programmer B appreciates these powerful functions and uses them a lot because they get so much done with only a single line of code. (Programmers B and A can be the same person.) You can see how this causes performance problems, especially if distributed over multiple layers.
Those are the first things that come to mind.

Does the advent of MultiCore architectures affect me as a software developer?

As a software developer dealing mostly with high-level programming languages I'm not sure what I can do to appropriately pay attention to the upcoming omni-presence of multicore computers. I write mostly ordinary and non-demanding applications, nevertheless I think it is important to know if I need to change any programming paradigms or even language to master the future.
My question therefore:
How to deal with increasing multicore presence in day-by-day hacking?
Herb Sutter wrote about it in 2005: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software
Most problems do not require a lot of CPU time. Really, single cores are quite fast enough for many purposes. When you do find your program is too slow, first profile it and look at your choice of algorithms, architecture, and caching. If that doesn't get you enough, try to divide the problem up into separate processes. Often this is worth doing simply for fault isolation and so that you can understand the CPU and memory usage of each process. Also, normally each process will run on a specific core and make good use of the processor caches, so you won't have to suffer the substantial performance overhead of keeping cache lines consistent. If you go for a multi process design and still find problem needs more CPU time than you get with the machine you have, you are well placed to extend it run over a cluster.
There are situations where you need multiple threads within the same address space, but beware that threads are really hard to get right. Race conditions, especially in non-safe languages, sometimes take weeks to debug; often, simply adding tracing or running under a debugger will change the timings enough to hide the problem. Simply putting locks everywhere often means you get a lot of locking overhead and sometimes so much lock contention that you don't really get the concurrency advantage you were hoping for. Even when you've got the locking right, you then need to profile to tune for cache coherency. Ultimately, if you want to really tune some highly concurrent code, you'll probably end up looking at lock-free constructs and more complex locking schemes than those in current multi-threading libraries.
Learn the benefits of concurrency, and the limits (e.g. Amdahl's law).
So you can, where possible, exploit the only route for higher performance that is going to be open. There is a lot of innovative work happening on easier approaches (futures and task libraries), and old work being rediscovered (functional languages and immutable data).
The free lunch is over, but that does not mean that there is nothing to exploit.
In general, become very friendly with threading. It's a terrible mechanism for parallelization, but it's what we have.
If you do work with .NET, look at the Parallel Extensions. They allow you to easily accomplish many parallel programming tasks.
To benefit from more that just one core you should consider parallelizing your code. Multiple threads, immutable types, and a minimum of synchronization are your new friends.
I think it will depend on what kind of applications you're writing.
Some kind of apps benefit more of the fact that they're run on a mutli-core cpu then others.
If your application can benefit from the multi-core fact, then you should be ready to go parallel.
The free lunch is over; that is: in the past, your application became faster when a new cpu was released and you didn't have to put any effort in your application to get that extra speed.
Now, to take advantage of the capabilities a multi-core cpu offers, you've to make sure that your application can take advantage of it. That is: you've to see which tasks can be executed multithreaded / concurrently, and this brings some issues to the table ...
Learn Erlang/F# (depending on your platform)
Prefer immutable data structures, their use makes software easier to understand not only in concurrent programs.
Learn the tools for concurrency in your language (e.g. java.util.concurrent, JCIP).
Learn a functional language (e.g Haskell).
I've been asked the same question, and the answer is, "it depends". If your Joe Winforms, maybe not so much. If your writing code that must be performant, yes. One of the biggest problem I can see with parallel programming is this: if something can't be parallized, and you lie and tell the run-time to do in parallel anyways, it's not going to crash, it's just going to do things wrong, and you'll get crap results and blame the framework.
Learn OpenMP and MPI for C and C++ code.
OpenMP also applies to other languages as well like Fortran I suppose.
Write smaller programs.
Other code languages/styles will let you do multithreading better (though multithreading is still really hard in any language) but the big benefit for regular developers, IMHO, is the ability to execute lots of smaller programs concurrently to accomplish some much larger task.
So, get in the habit of breaking your problems down into independent components that can be run whenever you want.
You'll build more maintainable software too.

Can you estimate an application's performance before testing?

It's a tricky question I was asked the other day... We're working on a pretty complex telephony (SIP) application with mixed C++ and PHP code with MySQL databases and several open source components.
A telecom engineer asked us to estimate the performance of the application (which is not ready yet). He went like 'well, you know how many packets can pass through the Linux kernel per second, plus you might know how quick your app is, so tell me how many calls will pass through your stuff per second'.
Seems nonsense to me, as there are a million scenarios that might happen (well, literally...)
However... is there a way to estimate application performance (knowing the hardware it will run on, being able to run standard benchmarks on it, etc) before actual testing?
You certainly can bound the problem with upper (max throughput) limits. There is nothing nonsense about that. In fact, not knowing that stuff indicates a pretty haphazard approach to a problem - especially in the telephony world.
You can work through the problem yourself - what is the minimum "work" you have to accomplish for a transaction or whatever unit of task you have in your app?
Some messages to and from, some processing and a database hit for example? Getting information on the individual pieces will give you an idea of the fastest possible throughput. If you load up the system and see significantly lower performance then you can take time to figure out where you are possibly losing throughput with inefficient algorithms, etc.
EDIT
To do this exercise you need to know all the steps your app does for each use case. Then you can identify the max throughput for each use case. You should definitely know this stuff prior to release and going live.
I'm ignoring the worst case analysis as that - as you point out - is quite a bit harder.
See Capacity Planning for Web Performance: Metrics, Models, and Methods. There are also some tools that can do this sort of discrete event simulation:
Hyperformix
SimPy
WikiPedia list of simulation tools
This stuff ain't easy, and the commercial tools will cost ya. The Capacity Planning book comes with a CD with lots of Excel workbook templates and examples of models that can jump start you.
Good luck :)
If you really have to answer this you could say something like this:
"I don't know off the top of my head. I am will to estimate this for you but it will take time. Obviously the accuracy of my answer depends upon how much effort (I.E. time) I put into calculating my estimate. How much time should I put into calculating my estimate?"
Put the burden back on them. If they really want an accurate answer, they're going to have to let you build at least some test applications that can simulate the actual environment.
You can spike to measure performance. Your whole system may not be working yet, but you know how the parts are intended to fit together. You can whip something up in a few hours that does the same kind of work as the final app will, across all the layers, and use it to measure performance of your design.
Remember: prototypes are broad, spikes are deep.
You should do the estimate. An estimate won't give you the right answer. It will however make you to think about the problem. Right now it sounds like your coding and hoping that everything will be OK. Or you are in panic mode and feel you don't have time for estimates.
Spend some time thinking about it. Analyse the important use cases. Think about the memory you may need; think about database access; think about network access (local and remote). These will effect the performance of your system. Get the whole team together to do this.
Regularly measure your system's performance during development for these important use cases. Mock up components/other systems if you have to. Analyse the results. How do these compare to your estimate. Maybe components are memory/database/network bound. Maybe you need more memory; less database access; simpler queries; caching. You don't have to make these changes straight away. However you do know how your system operates and what you need to do.
Result: Fewer nasty surprises at system test. Less panic as the release date looms.
You can definitely do capacity planning in advance, but the quality of the estimate will depend on the quality of the data available.
The best estimate is to build the system in test, run simulated workloads, then predict capacity as a function of performance requirements and workload. These 3 form a prediction space - given 2 of the 3, you can predict the third:
Given performance requirements and capacity (i.e. hardware) you can calculate the workload you can handle.
Given performance requirements and workload, you can calculate the capacity (i.e. hardware) that you need.
Given Workload and capacity, you can predict your expected performance.
This is true in some domains, but unless you are an expert in that domain then you don't have any idea. For example I write code to controlling industrial robots. The speed is limited by the robot motion, not by the execution speed of the code. Knowing how fast the robot is and how far it has to go, we can make fairly good estimates of "speed". I'd have no idea how to estimate time for your application.

Resources