Supplying 64 bit specific versions of your software - performance

Would I expect to see any performance gain by building my native C++ Client and Server into 64 bit code?
What sort of applications benefit from having a 64 bit specific build?
I'd imagine anything that makes extensive use of long would benefit, or any application that needs a huge amount of memory (i.e. more than 2Gb), but I'm not sure what else.

Architectural benefits of Intel x64 vs. x86
larger address space
a richer register set
can link against external libraries or load plugins that are 64-bit
Architectural downside of x64 mode
all pointers (and thus many instructions too) take up 2x the memory, cutting the effective processor cache size in half in the worst case
cannot link against external libraries or load plugins that are 32-bit
In applications I've written, I've sometimes seen big speedups (30%) and sometimes seen big slowdowns (> 2x) when switching to 64-bit. The big speedups have happened in number crunching / video processing applications where I was register-bound.
The only big slowdown I've seen in my own code when converting to 64-bit is from a massive pointer-chasing application where one compiler made some really bad "optimizations". Another compiler generated code where the performance difference was negligible.
Benefit of porting now
Writing 64-bit-compatible code isn't that hard 99% of the time, once you know what to watch out for. Mostly, it boils down to using size_t and ptrdiff_t instead of int when referring to memory addresses (I'm assuming C/C++ code here). It can be a pain to convert a lot of code that wasn't written to be 64-bit-aware.
Even if it doesn't make sense to make a 64-bit build for your application (it probably doesn't), it's worth the time to learn what it would take to make the build so that at least all new code and future refactorings will be 64-bit-compatible.

Before working too hard on figuring out whether there is a technical case for the 64-bit build, you must verify that there is a business case. Are your customers asking for such a build? Will it give you a definitive leg up in competition with other vendors? What is the cost for creating such a build and what business costs will be incurred by adding another item to your accounting, sales and marketing processes?
While I recognize that you need to understand the potential for performance improvements before you can get a handle on competitive advantages, I'd strongly suggest that you approach the problem from the big picture perspective. If you are a small or solo business, you owe it to yourself to do the appropriate due diligence. If you work for a larger organization, your superiors will greatly appreciate the effort you put into thinking about these questions (or will consider the whole issue just geeky excess if you seem unprepared to answer them).
With all of that said, my overall technical response would be that the vast majority of user-facing apps will see no advantage from a 64-bit build. Think about it: how much of the performance problems in your current app comes from being processor-bound (or RAM-access bound)? Is there a performance problem in your current app? (If not, you probably shouldn't be asking this question.)
If it is a Client/Server app, my bet is that network latency contributes far more to the performance on the client side (especially if your queries typically return a lot of data). Assuming that this is a database app, how much of your performance profile is due to disk latency times on the server? If you think about the entire constellation of factors that affect performance, you'll get a better handle on whether your specific app would benefit from a 64-bit upgrade and, if so, whether you need to upgrade both sides or whether all of your benefit would derive just from the server-side upgrade.

Not much else, really. Though writing a 64-bit app can have some advantages to you, as the programmer, in some cases. A simplistic example is an application whose primary focus is interacting with the registry. As a 32-bit process, your app would not have access to large swaths of the registry on 64-bit systems.

Continuing #mdbritt's comment, building for 64-bit makes far more sense [currently] if it's a server build, or if you're distributing to Linux users.
It seems that far more Windows workstations are still 32-bit, and there may not be a large customer base for a new build.
On the other hand, many server installs are 64-bit now: RHEL, Windows, SLES, etc. NOT building for them would be cutting-off a lot of potential usage, in my opinion.
Desktop Linux users are also likely to be running the 64-bit versions of their favorite distro (most likely Ubuntu, SuSE, or Fedora).
The main obvious benefit of building for 64-bit, however, is that you get around the 3GB barrier for memory usage.

According to this web page you benefit most from the extra general-purpose registers with 64 bit CPU if you use a lot of and/or deep loops.

You can expect gain thanks to the additional registers and the new passing parameters convention (which are really linked to the additional registers).

Related

Are there memory leaks in Linux?

This question is purely theoretical.
I was wondering whether the Linux source code could have memory leaks, and how they debugged it, considering that it is Linux, after all, that deals with each program's memory?
I obviously understand that Linux, being written in C, has to deal itself with malloc and free. What I don't understand is how we measure the operating system's memory leaks.
Note that this question is not Linux-specific; it also addresses the corresponding issues in Windows and MacOS X (darwin).
Quite frequently non-mainstream drivers and the staging tree has memory leaks. Follow the LKML and you can see occasional fixes for mistakes in the networking code for corner cases handling lists of SKBs.
Due to the nature of the kernel most work is code review and refactoring, but work is ongoing to make more tools:
http://www.linuxfoundation.org/en/Google_Summer_of_Code#kmemtrace_-_Kernel_Memory_Profiler
In certain cases you can use frameworks like Usermode Linux and then use conventional tools like Valgrind to attempt to peer into the running code:
http://user-mode-linux.sourceforge.net/
The implementation of malloc and free (actually brk/sbrk, since malloc and free are implemented by libc in-process) are not magical or special - it's just code, just like anything else, and there are data structures behind it that describe the mappings.
If you want to test correct behavior, one way is to write test programs in user-space that are known to allocate then free all their memory correctly. Run the app, then check the internal memory allocation structures in kernel mode using a debugger (or better yet, make this check a debug assert on process shutdown).
All software has bugs, including operating systems. Some of those bugs will result in memory leaks.
The Linux has a kernel debugger to help track down these things, but one has to discover that they exist before one can track them down. Usually, once a bug has been discovered and can be replicated at will, it becomes much easier to fix (Relatively speaking! Obviously you need a good coder to do the job). The hard part is finding the bugs in the first place and creating reliable test cases that demonstrate them. This is where you need to have a skilled QA team.
So I guess the short version of this answer is that good QA is as important good coding.

Is IntelliJ IDEA Ultimate slower than the Community edition?

The ultimate has many more features, so I imagine it loads slower, consumes more memory and is generally less responsive. Actually I don't care about the load time all that much, but memory and responsiveness (which are related) are of more importance.
Did you notice any difference between these editions?
If you are working on the same project and using the same features, you should not notice any differences.
However, if your project uses several languages (like Ruby, Python, Groovy, Scala), load time will be longer since IDEA will have to load plug-ins for these languages.
Also, more features used in the project, means that IDEA will need more memory. If you don't hit your hardware limit and have a descent rig, you'll not notice it at all. If you are short on memory, IDEA may become slower because of the frequent garbage collections and operating system swapping.
So, generally it would require more memory since you are going to use more features, but if it doesn't swap and you have enough free RAM, you will not have performance problems. Also note that you can disable all the unused plug-ins to save resources.
In the same Android application, I did not see that Ultimate version is slower. The load time is the same as well.

Testing perceived performance

I recently got a shiny new development workstation. The only disadvantage of this is that the desktop apps I'm developing now run very, very fast, and so I fear that parts of the code that would be annoyingly slow on end users' machines will go unnoticed during my testing.
Is there a good way to slow down an application for testing? I've tried searching around, but all of the results I've been able to find seem pretty fiddly to set up (e.g., manually setting up a high-priority CPU-bound task on the same CPU core as the target app, or running a background process that rapidly interrupts and resumes the target app), and I don't know if the end result is actually a good representation of running on a slower computer (with its slower CPU, slower RAM, slower disk I/O...).
I don't think that this is a job for a profiler; I'm interested in the user's perception of end-to-end performance rather than in where the time goes for particular operations.
setup a virtual machine, give in as little ram as needed and also you can have it use 1,2 or more CPUs. I like VirtualBox myself install your app and test with different RAM configs
Personally, I'd get an old used crappy computer that is typical of what the users have and test on that. It should be cheap and you will see pretty fast how bad things are.
I think the only way to deal with this is through proper end-user testing, i.e. get yourself a "typical" system for testing and use that to identify any perceptible performance bottlenecks.
You can try out either Virtual PC or VMWare Player/Workstation, load an OS onto it, and then throttle back the resources. I know that with any of those tools you can reduce the memory to whatever you'd like. You can also specify the number of cores you want to use. You might even be able to adjust the clock speed in VMWare Workstation... I'm not sure.
I upvoted SQLMenace's answer, but, I also think that profiling needs to be mentioned, no matter how quickly the code is executing - you'll still see what's taking the most time. If you find yourself with some free time, I think profiling and investigating the results is a good way to spend it.

When to support Windows 64?

I'm currently providing 32 bit Windows music software. Some of my users are asking for 64-bit support. I plan to eventually, but porting is a big job, and I have plenty of other important feature requests too. I need to allocate my limited time wisely.
How much market share do 64-bit operating systems hold?, and what's the trend.
No better time than now. As the need for more ram increases 64 bit windows versions will get more and more prevalent. Play around a bit with Google trends and you will see a clear uptick in people looking into it. As explained in "Dude, Where's My 4 Gigabytes of RAM?" the need for the every day user to go to a 64bit OS is just going to keep growing.
Edit in response to Jeff's comment
I understand, any team will have to balance upgrades/bug fixes by priority. That will always be a difficult balance to strike. The benefits of a 64 bit version will only continue to grow!
Good luck striking the right balance!
Why are they asking for 64-bit support? Does your 32-bit software not work on Win64, or are they assuming they need a special version when in fact they'd be fine with the 32-bit version? In my experience, Win64's support for 32-bit programs is excellent, and it's likely to continue to exist for the foreseeable future.
If your software doesn't work, and it's not due to a fundamental limitation like half the logic being in a device driver, then making it work as 32-bit executable may be easier than you think.
(Forgive me if I'm teaching you to suck eggs. 8-)
There are 3 common things that would be good reasons to port to Win64:
your product includes a driver - in this case to work at all on a Win64 system, at least the driver must be ported.
your product has shell or IE integration - since on a Win64 system the user is likely using the 64-bit version of Explorer and IE, you'll need 64-bit plug-ins to integrate with those. (you should continue to package and install the 32-bit versions so things will still work if the user finds themselves in a 32-bit file manager or IE instance).
your product would noticeably benefit from increased address space - if your product consumes a lot of data (like database or number crunching apps often do), your application will have far more virtual address space available on a Win64 system and can often use that to advantage.
Note that there may be other good reasons to port, but these are the common ones. Also note that porting for one of the above reasons does not necessarily mean that everything has to be ported. For example, you may be able to get away with just porting your device driver.
If none of these reasons fit, then it might just be your users wanting something for no good reason - educating them may help. But if it starts impacting sales, you might find yourself in a position where you have to port just to make them happy even if there's no good technical reason (hopefully your customers aren't that unreasonable and will listen to sound technical advice).
But even if you don't port your code to Win64 there's no reason not to test and support your application on Win64 systems.
Music software is a bit vague. If you are developing music encoding/decoding software professionally, then 64 bit is something you should take seriously, since it can have noticeable impact on encoding/decoding performance.
Otherwise, while 64 bit is getting increasingly popular, your 32 bit app will still run perfectly so other features are more important in the meantime. However, you should think about 64 bit porting too and refactor your code to be more portable as you go forward.
I would agree with others here that now is a great time to start supporting 64 bit operating systems. With Windows 7 just around the corner you will see a much larger portion of users running with 64 bit operating systems. Even if your software is not 100% optimized for 64 bit processors, the port will gain access to the additional registers and such that are associated with running 64 bit code and could see a performance increase. Not to mention not running up against the 4gb wall and all that.
Just keep in mind your data structures may change in size and your application will likely use more memory.
If I'm wrong about any of this, please, someone correct me!
It's not time to port yet just be sure to also test your software on 64 bit systems. The emulator on Vista or 7 is good enough and should not cause any trouble.
The main advantage is the bigger amount of ram which can be allocated. If there is a lot of ram used and there is a lot of caching going on then you should port.
x64 PC and OS market share will only rise. It's the future. Best to support the future early on.

Determining recommended system requirements

We recently changed some of our system requirements on a light weight application (it is essentially a thin gui client that connects to a "mainframe" that runs IBM UniVerse). We didn't change our minimum requirements at all, but changed our recommended requirements to match those of Windows 7 and Vista (since we run on those machines).
Some system requirements are fairly easy to determine (ie: network card, hard drive space, etc...). But CPU and RAM are harder to nail down.
Our current list of minimum requirements for CPU and RAM both state that you have to meet the minimum's for your operating system. That seems fairly reasonable to us, since our app uses only 15MB or active memory and very little CPU (it's a simple GUI, in this case), so that works. This seems fine, no one complains about that.
When it comes to recommended requirements though, we've run into trouble nailing down specifics, especially nowadays, when saying minimum 1.6 gHz (or similar) can mean anything when you start talking about multi-core processors, atom processors, etc... The thin client is starting to do more intensive stuff (it now contains an embedded web browser to help display more user friendly html pages, for example).
What would be a good way to go about determining recommended values for CPU and RAM?
Do you take the recommended for an O/S and add your usage values on top (so do we then say 1GB for Vista machines?)?
Is there a better way to do so?
(Note: this is similar in nature to the server question here, but from an application base instead)
Let's try this from another perspective.
First, test your application on a minimum configuration machine. What bottlenecks if any exist?
Does it cause a lot of disk swapping? If so, you need more RAM.
Is it generally slow when performing regular operations (excluding memory usage) then increase processor requirements.
Does it require diskspace beyond the app footprint such as for file handling? List that.
Does your app depend on certain instruction sets to be on chip? (SSE, Execute Disable Bit, Intel Virtualization,.. as examples). If so, then you have to list what processors will actually work with the app.
Typically speaking, if the app works fine when using a minimum configuration for the OS; then your "recommended" configuration should be identical to the OS's recommended.
At the end of the day, you probably need to have a couple of machines on hand to profile. Virtual machines are NOT a good option in this case. By definition, the VM and the host OS will have an impact. Further, just because you can throttle a certain processor down doesn't mean that it is running at a similar level to a processor normally built for that level.
For example, a Dual Core 1.8 GHz processor throttled to only use one core is still a very different beast than a P4 1.8 GHz processor. There are architectural differences as well as L2 and L3 cache changes.
By the same token, a machine with a P4 processor uses a different type of RAM than one with a dual core (DDR vs DDR2). RAM speeds do have an impact.
So, try to stick to the OS recommendations as they've already done the hard part for you.
Come up with some concrete non-functional requirements relating to things like latency of response, throughput, and startup time, and then benchmark them on a few varied machines. The attempt to extrapolate to what hardware will allow a typical user to have an experience that matches your requirements.
For determining the CPU and RAM you could try using Microsoft Virtual PC which allows you to set your CPU and RAM settings. You can then test a few different setups to see what would be sufficient for a regular user.
As for the recommended requirements, adding them on top of the basic OS requirements would probably be the safe bet.
Microsoft introduced the Windows Experience Index in Vista to solve this exact problem.
UPDATE FOR MORE INFO
It takes into consideration the entire system. Bear in mind that they may have a minimum level processor, but if they have a crap video card then a lot of processor time is going to be spent just drawing the windows... If you pick a decent experience index number like 3.0 then you can be reasonably assured that they will have a good experience with your application. If you require more horsepower, bump up the requirements to 4.0.
One example is the Dell I'm using to type this on. It's a 2 year old machine but still registers 4.2 on the experience index. Most business class machines should be able to register at least a 3; which should be enough horsepower for the app you described.
Incidentally, my 5 year old laptop registers as a 2.0 and it was mid level at the time I purchased it.

Resources