Improving the performance of System.String to std::wstring conversions? - visual-studio-2010

I'm currently evaluating the use of ADO.NET for a C++ application that currently uses plain old ADO. Given that we're redoing the whole database interaction, we'd like to determine if using the more modern and actively developed technology of ADO.NET would be beneficial.
After some measurements it appears that for certain test queries that retrieve a lot of rows with few columns that all contain strings, ADO.NET is actually about 20% slower for us than using plain ADO. Our profiler suggests that the conversion of System.String results into the std::wstring used by the application is one of the bottlenecks. I can't switch any of the upper layers of the application to using System.String, so we are stuck with this particular conversion.
A rough outline of the code looks like this:
System::Data::SqlClient::SqlCommand^ sqlCmd =
gcnew System::Data::SqlClient::SqlCommand(cmd, m_DBConnection.get());
System::Data::SqlClient::SqlDataReader^ reader = sqlCmd->ExecuteReader();
if (reader->HasRows)
using namespace msclr::interop;
while (reader->Read())
std::vector<std::wstring> results;
for (int i=0; i < reader->FieldCount; ++i)
std::wstring col_data;
TypeCode type = Type::GetTypeCode(reader->GetFieldType(i));
switch (type)
// ... omit lots of different types
case TypeCode::String:
System::String^ tmp = reader->GetString(i);
col_data = marshal_as<std::wstring>(tmp);
// ... more type conversion code removed
// NOTE: Callback into native result processing code
I've spent a lot of time reading up on the various ways of getting a std::wstring out of the System.String and measured most of them. They all seem to perform roughly similar - we're talking decimal points in the percentage of CPU usage. In the end I simply settled for using marshal_as<std::wstring> as it's the most readable and appears to be as performant as the other solutions (ie, using PtrToStringChars or the method described in MSDN here).
Using the DataReader works very well from a conceptual point of view as most of the processing we do on the data is row oriented anyway.
The only other slightly unexpected bottleneck I noticed is the retrieval of the TypeCode for the results columns; I'm already planning to move that outside the main results processing loop and only retrieve the type codes once per query result.
After this lengthy introduction, can anybody recommend a less costly way to convert the string data from a System.String to a std::wstring or am I already looking at the optimum performance here? I'm obviously more looking for slightly out of the ordinary ways given that I've already tried all the ordinary ones...
EDIT: Looks like I fell into a trap of my own making here. Yes, the code above is about 20% slower than the equivalent plain ADO code in Debug mode. However switching it into Release mode, the bottleneck is still measurable but the ADO.NET code above is suddenly almost 50% faster than the older ADO code. So while I'm still concerned a little about the cost of the string conversion, it's not as big in Release mode as it first appeared.

I don't see there being any way to optimize that, since the implementation of marshal_as<std::wstring> just grabs the internal C string and assigns it to an std::wstring. You can't get much more efficient than that.
The only solution I can see is splitting up your rows and having N threads process them in parallel. The only issue is that you would need to reserve enough space in your vector to prevent a resize from taking place during processing, but that looks easy enough.
If you're using Visual Studio 2010, I think the C++0x threading library would be sufficient for this task, though I'm not sure how much (if any) is implemented in Visual Studio so far.


Recreating bugs in cocos2d iphone

I guess someone must have asked a similar question before, but here goes.
It would be useful to be able to record games so that if a bug happened during the game, the recorded play can be reused later with a fixed build to confirm if the bug is fixed or not. I am using box2d as well and from what I remember it seems as if box2d is not really deterministic, but at least being able to recreate most of the state from the first time would be OK in many cases. Recreating the same randomized values would take reinstating the same time etc I assume. Any insight?
I have been fiddling with calabash-ios with various success. I know it's possible to record plays, and playback them there later. I just assume it wouldn't recreate random values.
A quick look at box2d faq and I think box2d is deterministic enough
For the same input, and same binary, Box2D will reproduce any
simulation. Box2D does not use any random numbers nor base any
computation on random events (such as timers, etc).
However, people often want more stringent determinism. People often
want to know if Box2D can produce identical results on different
binaries and on different platforms. The answer is no. The reason for
this answer has to do with how floating point math is implemented in
many compilers and processors. I recommend reading this article if you
are curious:
If you encapsulate the input state the player gives to the world each time step (eg. in a POD struct) then it's pretty straightforward to write that to a file. For example, suppose you have input state like:
struct inputStruct {
bool someButtonPressed;
bool someOtherKeyPressed;
float accelerometerZ;
... etc
Then you can do something like this each time step:
inputStruct currentState;
currentState.someButtonPressed = ...; // set contents from live user input
if ( recording )
fwrite( &currentState, sizeof(inputStruct), 1, file );
else if ( replaying ) {
inputStruct tmpState;
int readCount = fread( &tmpState, sizeof(inputStruct), 1, file );
if ( readCount == 1 )
currentState = tmpState; //overwrite live input
applyState( currentState ); // apply forces, game logic from input
world->Step( ... ); // step the Box2D world
Please excuse the C++ centric code :~) No doubt there are equivalent ways to do it with Objective-C.
This method lets you regain live control when the input from the file runs out. 'file' is a FILE* that you would have to open in the appropriate mode (rb or wb) when the level was loaded. If the bug you're chasing causes a crash, you might need to fflush after writing to make sure the input state actually gets written before crashing.
As you have noted, this is highly unlikely to work across different platforms. You should not assume that the replay file will reproduce the same result on anything other than the device that recorded it (which should be fine for debugging purposes).
As for random values, you'll need to ensure that anything using random values that may affect the Box2D world go through a deterministic random generator which is not shared with other code, and you'll need to record the seed that was used for each replay. You might like to use one of the many implementations of Mersenne Twister found at
When I say 'not shared', suppose you also use the MT algorithm to generate random directions for particles, purely for rendering purposes - you would not want to use the same generator instance for that as you do for physics-related randomizations.

What's the most efficient way to ignore code in lua?

I have a chunk of lua code that I'd like to be able to (selectively) ignore. I don't have the option of not reading it in and sometimes I'd like it to be processed, sometimes not, so I can't just comment it out (that is, there's a whole bunch of blocks of code and I either have the option of reading none of them or reading all of them). I came up with two ways to implement this (there may well be more - I'm very much a beginner): either enclose the code in a function and then call or not call the function (and once I'm sure I'm passed the point where I would call the function, I can set it to nil to free up the memory) or enclose the code in an if ... end block. The former has slight advantages in that there are several of these blocks and using the former method makes it easier for one block to load another even if the main program didn't request it, but the latter seems the more efficient. However, not knowing much, I don't know if the efficiency saving is worth it.
So how much more efficient is:
if false then
-- a few hundred lines
throwaway = function ()
-- a few hundred lines
throwaway = nil -- to ensure that both methods leave me in the same state after garbage collection
If it depends a lot on the lua implementation, how big would the "few hundred lines" need to be to reliably spot the difference, and what sort of stuff should it include to best test (the main use of the blocks is to define a load of possibly useful functions)?
Lua's not smart enough to dump the code for the function, so you're not going to save any memory.
In terms of speed, you're talking about a different of nanoseconds which happens once per program execution. It's harming your efficiency to worry about this, which has virtually no relevance to actual performance. Write the code that you feel expresses your intent most clearly, without trying to be clever. If you run into performance issues, it's going to be a million miles away from this decision.
If you want to save memory, which is understandable on a mobile platform, you could put your conditional code in it's own module and never load it at all of not needed (if your framework supports it; e.g. MOAI does, Corona doesn't).
If there is really a lot of unused code, you can define it as a collection of Strings and loadstring() it when needed. Storing functions as strings will reduce the initial compile time, however of most functions the string representation probably takes up more memory than it's compiled form and what you save when compiling is probably not significant before a few thousand lines... Just saying.
If you put this code in a table, you could compile it transparently through a metatable for minimal performance impact on repeated calls.
Example code
local code_uncompiled = {
f = [=[
local x, y = ...;
return x+y;
code = setmetatable({}, {
__index = function(self, k)
self[k] = assert(loadstring(code_uncompiled[k]));
return self[k];
local ff = code.f; -- code of x gets compiled here
ff = code.f; -- no compilation here
for i=1, 1000 do
print( ff(2*i, -i) ); -- no compilation here either
print( code.f(2*i, -i) ); -- no compile either, but table access (slower)
The beauty of it is that this compiles as needed and you don't really have to waste another thought on it, it's just like storing a function in a table and allows for a lot of flexibility.
Another advantage of this solution is that when the amount of dynamically loaded code gets out of hand, you could transparently change it to load code from external files on demand through the __index function of the metatable. Also, you can mix compiled and uncompiled code by populating the "code" table with "real" functions.
Try the one that makes the code more legible to you first. If it runs fast enough on your target machine, use that.
If it doesn't run fast enough, try the other one.
lua can ignore multiple lines by:
function dostuff()
ignore this
and this
maybe this
this as well

Why does loading cached objects increase the memory consumption drastically when computing them will not?

Relevant background info
I've built a little software that can be customized via a config file. The config file is parsed and translated into a nested environment structure (e.g. .HIVE$db = an environment, .HIVE$db$user = "Horst", .HIVE$db$pw = "my password", .HIVE$regex$date = some regex for dates etc.)
I've built routines that can handle those nested environments (e.g. look up value "db/user" or "regex/date", change it etc.). The thing is that the initial parsing of the config files takes a long time and results in quite a big of an object (actually three to four, between 4 and 16 MB). So I thought "No problem, let's just cache them by saving the object(s) to .Rdata files". This works, but "loading" cached objects makes my Rterm process go through the roof with respect to RAM consumption (over 1 GB!!) and I still don't really understand why (this doesn't happen when I "compute" the object all anew, but that's exactly what I'm trying to avoid since it takes too long).
I already thought about maybe serializing it, but I haven't tested it as I would need to refactor my code a bit. Plus I'm not sure if it would affect the "loading back into R" part in just the same way as loading .Rdata files.
Can anyone tell me why loading a previously computed object has such effects on memory consumption of my Rterm process (compared to computing it in every new process I start) and how best to avoid this?
If desired, I will also try to come up with an example, but it's a bit tricky to reproduce my exact scenario. Yet I'll try.
Its likely because the environments you are creating are carrying around their ancestors. If you don't need the ancestor information then set the parents of such environments to emptyenv() (or just don't use environments if you don't need them).
Also note that formulas (and, of course, functions) have environments so watch out for those too.
If it's not reproducible by others, it will be hard to answer. However, I do something quite similar to what you're doing, yet I use JSON files to store all of my values. Rather than parse the text, I use RJSONIO to convert everything to a list, and getting stuff from a list is very easy. (You could, if you want, convert to a hash, but it's nice to have layers of nested parameters.)
See this answer for an example of how I've done this kind of thing. If that works out for you, then you can forego the expensive translation step and the memory ballooning.
(Taking a stab at the original question...) I wonder if your issue is that you are using an environment rather than a list. Saving environments might be tricky in some contexts. Saving lists is no problem. Try using a list or try converting to/from an environment. You can use the as.list() and as.environment() functions for this.

Does soCaseInsensitive greatly impact performance for a TdxMemIndex on a TdxMemDataset?

I am adding some indexes to my DevExpress TdxMemDataset to improve performance. The TdxMemIndex has SortOptions which include the option for soCaseInsensitive. My data is usually a GUID string, so it is not case sensitive. I am wondering if I am better off just forcing all the data to the same case or if the soCaseInsensitive flag and using the loCaseInsensitive flag with the call to Locate has only a minor performance penalty (roughly equal to converting the case of my string every time I need to use the index).
At this point I am leaving the CaseInsentive off and just converting case.
IMHO, The best is to assure the data quality at Post time. Reasonings:
You (usually) know the nature of the data. So, eg. you can use UpperCase (knowing that GUIDs are all in ASCII range) instead of much slower AnsiUpperCase which a general component like TdxMemDataSet is forced to use.
You enter the data only once. Searching/Sorting/Filtering which all implies the internal upercassing engine of TdxMemDataSet it's a repeated action. Also, there are other chained actions which will trigger this engine whithout realizing. (Eg. a TcxGrid which is Sorted by default having GridMode:=True (I assume that you use the DevEx. components) and having a class acting like a broker passing the sort message to the underlying dataset.
Usually the data entry is done in steps, one or few records in a batch. The only notable exception is data aquisition applications. But in both cases above the user's usability culture allows way greater response times for you to play with. (IOW how much would add an UpperCase call to a record post which lasts 0.005 ms?) OTOH, users are very demanding with the speed of data retreival operations (searching, sorting, filtering etc.). Keep the data retreival as fast as you can.
Having the data in the database ready to expose reduces the risk of processing errors when you'll write (if you'll write) other modules (you need to remember to AnsiUpperCase the data in any module in any language you'll write). Also here a classical example is when you'll use other external tools to access the data (for ex. db managers to execute an SQL SELCT over the data).
Maybe the DevExpress forums (or ever a support email, if you have access to it) would be a better place to seek an authoritative answer on that performance question.
Anyway, is better to guarantee that data is on the format you want - for the reasons plainth already explained - the moment you save it. So, in that specific, make sure the GUID is written in upper(or lower, its a matter of taste)case. If it is SQL Server or another database server that have an guid datatype, make sure the SELECT make the work - if applicable and possible, even the sort.

What is the most ridiculous pessimization you've seen? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
We all know that premature optimization is the root of all evil because it leads to unreadable/unmaintainable code. Even worse is pessimization, when someone implements an "optimization" because they think it will be faster, but it ends up being slower, as well as being buggy, unmaintainable, etc. What is the most ridiculous example of this that you've seen?
I think the phrase "premature optimization is the root of all evil" is way, way over used. For many projects, it has become an excuse not to take performance into account until late in a project.
This phrase is often a crutch for people to avoid work. I see this phrase used when people should really say "Gee, we really didn't think of that up front and don't have time to deal with it now".
I've seen many more "ridiculous" examples of dumb performance problems than examples of problems introduced due to "pessimization"
Reading the same registry key thousands (or 10's of thousands) of times during program launch.
Loading the same DLL hundreds or thousands of times
Wasting mega bytes of memory by keeping full paths to files needlessly
Not organizing data structures so they take up way more memory than they need
Sizing all strings that store file names or paths to MAX_PATH
Gratuitous polling for thing that have events, callbacks or other notification mechanisms
What I think is a better statement is this: "optimization without measuring and understanding isn't optimization at all - its just random change".
Good Performance work is time consuming - often more so that the development of the feature or component itself.
Databases are pessimization playland.
Favorites include:
Split a table into multiples (by date range, alphabetic range, etc.) because it's "too big".
Create an archive table for retired records, but continue to UNION it with the production table.
Duplicate entire databases by (division/customer/product/etc.)
Resist adding columns to an index because it makes it too big.
Create lots of summary tables because recalculating from raw data is too slow.
Create columns with subfields to save space.
Denormalize into fields-as-an-array.
That's off the top of my head.
I think there is no absolute rule: some things are best optimized upfront, and some are not.
For example, I worked in a company where we received data packets from satellites. Each packet cost a lot of money, so all the data was highly optimized (ie. packed). For example, latitude/longitude was not sent as absolute values (floats), but as offsets relative to the "north-west" corner of a "current" zone. We had to unpack all the data before it could be used. But I think this is not pessimization, it is intelligent optimization to reduce communication costs.
On the other hand, our software architects decided that the unpacked data should be formatted into a very readable XML document, and stored in our database as such (as opposed to having each field stored in a corresponding column). Their idea was that "XML is the future", "disk space is cheap", and "processor is cheap", so there was no need to optimize anything. The result was that our 16-bytes packets were turned into 2kB documents stored in one column, and for even simple queries we had to load megabytes of XML documents in memory! We received over 50 packets per second, so you can imagine how horrible the performance became (BTW, the company went bankrupt).
So again, there is no absolute rule. Yes, sometimes optimization too early is a mistake. But sometimes the "cpu/disk space/memory is cheap" motto is the real root of all evil.
On an old project we inherited some (otherwise excellent) embedded systems programmers who had massive Z-8000 experience.
Our new environment was 32-bit Sparc Solaris.
One of the guys went and changed all ints to shorts to speed up our code, since grabbing 16 bits from RAM was quicker than grabbing 32 bits.
I had to write a demo program to show that grabbing 32-bit values on a 32-bit system was faster than grabbing 16-bit values, and explain that to grab a 16-bit value the CPU had to make a 32-bit wide memory access and then mask out or shift the bits not needed for the 16-bit value.
Oh good Lord, I think I have seen them all. More often than not it is an effort to fix performance problems by someone that is too darn lazy to troubleshoot their way down to the CAUSE of those performance problems or even researching whether there actually IS a performance problem. In many of these cases I wonder if it isn't just a case of that person wanting to try a particular technology and desperately looking for a nail that fits their shiny new hammer.
Here's a recent example:
Data architect comes to me with an elaborate proposal to vertically partition a key table in a fairly large and complex application. He wants to know what type of development effort would be necessary to adjust for the change. The conversation went like this:
Me: Why are you considering this? What is the problem you are trying to solve?
Him: Table X is too wide, we are partitioning it for performance reasons.
Me: What makes you think it is too wide?
Him: The consultant said that is way too many columns to have in one table.
Me: And this is affecting performance?
Him: Yes, users have reported intermittent slowdowns in the XYZ module of the application.
Me: How do you know the width of the table is the source of the problem?
Him: That is the key table used by the XYZ module, and it is like 200 columns. It must be the problem.
Me (Explaining): But module XYZ in particular uses most of the columns in that table, and the columns it uses are unpredictable because the user configures the app to show the data they want to display from that table. It is likely that 95% of the time we'd wind up joining all the tables back together anyway which would hurt performance.
Him: The consultant said it is too wide and we need to change it.
Me: Who is this consultant? I didn't know we hired a consultant, nor did they talk to the development team at all.
Him: Well, we haven't hired them yet. This is part of a proposal they offered, but they insisted we needed to re-architect this database.
Me: Uh huh. So the consultant who sells database re-design services thinks we need a database re-design....
The conversation went on and on like this. Afterward, I took another look at the table in question and determined that it probably could be narrowed with some simple normalization with no need for exotic partitioning strategies. This, of course turned out to be a moot point once I investigated the performance problems (previously unreported) and tracked them down to two factors:
Missing indexes on a few key
A few rogue data analysts who were periodically
locking key tables
(including the "too-wide" one)
by querying the
production database directly with
Of course the architect is still pushing for a vertical partitioning of the table hanging on to the "too wide" meta-problem. He even bolstered his case by getting a proposal from another database consultant who was able to determine we needed major design changes to the database without looking at the app or running any performance analysis.
I have seen people using alphadrive-7 to totally incubate CHX-LT. This is an uncommon practice. The more common practice is to initialize the ZT transformer so that bufferication is reduced (due to greater net overload resistance) and create java style bytegraphications.
Totally pessimistic!
Nothing Earth-shattering, I admit, but I've caught people using StringBuffer to concatenate Strings outside of a loop in Java. It was something simple like turning
String msg = "Count = " + count + " of " + total + ".";
StringBuffer sb = new StringBuffer("Count = ");
sb.append(" of ");
String msg = sb.toString();
It used to be quite common practice to use the technique in a loop, because it was measurably faster. The thing is, StringBuffer is synchronized, so there's actually extra overhead if you're only concatenating a few Strings. (Not to mention that the difference is absolutely trivial on this scale.) Two other points about this practice:
StringBuilder is unsynchronized, so should be preferred over StringBuffer in cases where your code can't be called from multiple threads.
Modern Java compilers will turn readable String concatenation into optimized bytecode for you when it's appropriate anyway.
I once saw a MSSQL database that used a 'Root' table. The Root table had four columns: GUID (uniqueidentifier), ID (int), LastModDate (datetime), and CreateDate (datetime). All tables in the database were Foreign Key'd to the Root table. Whenever a new row was created in any table in the db, you had to use a couple of stored procedures to insert an entry in the Root table before you could get to the actual table you cared about (rather than the database doing the job for you with a few triggers simple triggers).
This created a mess of useless overheard and headaches, required anything written on top of it to use sprocs (and eliminating my hopes of introducing LINQ to the company. It was possible but just not worth the headache), and to top it off didn't even accomplish what it was supposed to do.
The developer that chose this path defended it under the assumption that this saved tons of space because we weren't using Guids on the tables themselves (but...isn't a GUID generated in the Root table for every row we make?), improved performance somehow, and made it "easy" to audit changes to the database.
Oh, and the database diagram looked like a mutant spider from hell.
How about POBI -- pessimization obviously by intent?
Collegue of mine in the 90s was tired of getting kicked in the ass by the CEO just because the CEO spent the first day of every ERP software (a custom one) release with locating performance issues in the new functionalities. Even if the new functionalities crunched gigabytes and made the impossible possible, he always found some detail, or even seemingly major issue, to whine upon. He believed to know a lot about programming and got his kicks by kicking programmer asses.
Due to the incompetent nature of the criticism (he was a CEO, not an IT guy), my collegue never managed to get it right. If you do not have a performance problem, you cannot eliminate it...
Until for one release, he put a lot of Delay (200) function calls (it was Delphi) into the new code.
It took just 20 minutes after go-live, and he was ordered to appear in the CEO's office to fetch his overdue insults in person.
Only unusual thing so far was my collegues mute when he returned, smiling, joking, going out for a BigMac or two while he normally would kick tables, flame about the CEO and the company, and spend the rest of the day turned down to death.
Naturally, my collegue now rested for one or two days at his desk, improving his aiming skills in Quake -- then on the second or third day he deleted the Delay calls, rebuilt and released an "emergency patch" of which he spread the word that he had spent 2 days and 1 night to fix the performance holes.
This was the first (and only) time that evil CEO said "great job!" to him. That's all that counts, right?
This was real POBI.
But it also is a kind of social process optimization, so it's 100% ok.
I think.
"Database Independence". This meant no stored procs, triggers, etc - not even any foreign keys.
var stringBuilder = new StringBuilder();
stringBuilder.Append(myObj.a + myObj.b + myObj.c + myObj.d);
string cat = stringBuilder.ToString();
Best use of a StringBuilder I've ever seen.
Using a regex to split a string when a simple string.split suffices
Very late to this thread I know, but I saw this recently:
bool isFinished = GetIsFinished();
switch (isFinished)
case true:
case false:
Y'know, just in case a boolean had some extra values...
Worst example I can think of is an internal database at my company containing information on all employees. It gets a nightly update from HR and has an ASP.NET web service on top. Many other apps use the web service to populate things like search/dropdown fields.
The pessimism is that the developer thought that repeated calls to the web service would be too slow to make repeated SQL queries. So what did he do? The application start event reads in the entire database and converts it all to objects in memory, stored indefinitely until the app pool is recycled. This code was so slow, it would take 15 minutes to load in less than 2000 employees. If you inadvertently recycled the app pool during the day, it could take 30 minutes or more, because each web service request would start multiple concurrent reloads. For this reason, new hires wouldn't appear in the database the first day when their account was created and therefore would not be able to access most internal apps on their first couple days, twiddling their thumbs.
The second level of pessimism is that the development manager doesn't want to touch it for fear of breaking dependent applications, but yet we continue to have sporadic company-wide outages of critical applications due to poor design of such a simple component.
No one seems to have mentioned sorting, so I will.
Several different times, I've discovered that someone had hand-crafted a bubblesort, because the situation "didn't require" a call to the "too fancy" quicksort algorithm that already existed. The developer was satisified when their handcrafted bubblesort worked well enough on the ten rows of data that they're using for testing. It didn't go over quite as well after the customer had added a couple of thousand rows.
I once worked on an app that was full of code like this:
1 tuple *FindTuple( DataSet *set, int target ) {
2 tuple *found = null;
3 tuple *curr = GetFirstTupleOfSet(set);
4 while (curr) {
5 if (curr->id == target)
6 found = curr;
7 curr = GetNextTuple(curr);
8 }
9 return found;
10 }
Simply removing found, returning null at the end, and changing the sixth line to:
return curr;
Doubled the app performance.
I once had to attempt to modify code that included these gems in the Constants class
public static String COMMA_DELIMINATOR=",";
public static String COMMA_SPACE_DELIMINATOR=", ";
public static String COLIN_DELIMINATOR=":";
Each of these were used multiple times in the rest of the application for different purposes. COMMA_DELIMINATOR littered the code with over 200 uses in 8 different packages.
The big all time number one which I run into time and time again in inhouse software:
Not using the features of the DBMS for "portability" reasons because "we might want to switch to another vendor later".
Read my lips. For any inhouse work: IT WILL NOT HAPPEN!
I had a co-worker who was trying to outwit our C compiler's optimizer and routine rewrote code that only he could read. One of his favorite tricks was changing a readable method like (making up some code):
int some_method(int input1, int input2) {
int x;
if (input1 == -1) {
return 0;
if (input1 == input2) {
return input1;
... a long expression here ...
return x;
into this:
int some_method() {
return (input == -1) ? 0 : (input1 == input2) ? input 1 :
... a long expression ...
... a long expression ...
... a long expression ...
That is, the first line of a once-readable method would become "return" and all other logic would be replace by deeply nested terniary expressions. When you tried to argue about how this was unmaintainable, he would point to the fact that the assembly output of his method was three or four assembly instructions shorter. It wasn't necessarily any faster but it was always a tiny bit shorter. This was an embedded system where memory usage occasionally did matter, but there were far easier optimizations that could have been made than this that would have left the code readable.
Then, after this, for some reason he decided that ptr->structElement was too unreadable, so he started changing all of these into (*ptr).structElement on the theory that it was more readable and faster as well.
Turning readable code into unreadable code for at the most a 1% improvement, and sometimes actually slower code.
In one of my first jobs as a full-fledged developer, I took over a project for a program that was suffering scaling issues. It would work reasonably well on small data sets, but would completely crash when given large quantities of data.
As I dug in, I found that the original programmer sought to speed things up by parallelizing the analysis - launching a new thread for each additional data source. However, he'd made a mistake in that all threads required a shared resource, on which they were deadlocking. Of course, all benefits of concurrency disappeared. Moreover it crashed most systems to launch 100+ threads only to have all but one of them lock. My beefy dev machine was an exception in that it churned through a 150-source dataset in around 6 hours.
So to fix it, I removed the multi-threading components and cleaned up the I/O. With no other changes, execution time on the 150-source dataset dropped below 10 minutes on my machine, and from infinity to under half an hour on the average company machine.
I suppose I could offer this gem:
unsigned long isqrt(unsigned long value)
unsigned long tmp = 1, root = 0;
#define ISQRT_INNER(shift) \
{ \
if (value >= (tmp = ((root << 1) + (1 << (shift))) << (shift))) \
{ \
root += 1 << shift; \
value -= tmp; \
} \
// Find out how many bytes our value uses
// so we don't do any uneeded work.
if (value & 0xffff0000)
if ((value & 0xff000000) == 0)
tmp = 3;
tmp = 4;
else if (value & 0x0000ff00)
tmp = 2;
switch (tmp)
case 4:
case 3:
case 2:
case 1:
return root;
Since the square-root was calculated at a very sensitive place, I got the task of looking into a way to make it faster. This small refactoring reduced the execution time by a third (for the combination of hardware and compiler used, YMMV):
unsigned long isqrt(unsigned long value)
unsigned long tmp = 1, root = 0;
#define ISQRT_INNER(shift) \
{ \
if (value >= (tmp = ((root << 1) + (1 << (shift))) << (shift))) \
{ \
root += 1 << shift; \
value -= tmp; \
} \
return root;
Of course there are both faster AND better ways to do this, but I think it's a pretty neat example of a pessimization.
Edit: Come to think of it, the unrolled loop was actually also a neat pessimization. Digging though the version control, I can present the second stage of refactoring as well, which performed even better than the above:
unsigned long isqrt(unsigned long value)
unsigned long tmp = 1 << 30, root = 0;
while (tmp != 0)
if (value >= root + tmp) {
value -= root + tmp;
root += tmp << 1;
root >>= 1;
tmp >>= 2;
return root;
This is exactly the same algorithm, albeit a slightly different implementation, so I suppose it qualifies.
This might be at a higher level that what you were after, but fixing it (if you're allowed) also involves a higher level of pain:
Insisting on hand rolling an Object Relationship Manager / Data Access Layer instead of using one of the established, tested, mature libraries out there (even after they've been pointed out to you).
All foreign-key constraints were removed from a database, because otherwise there would be so many errors.
This doesn't exactly fit the question, but I'll mention it anyway a cautionary tale. I was working on a distributed app that was running slowly, and flew down to DC to sit in on a meeting primarily aimed at solving the problem. The project lead started to outline a re-architecture aimed at resolving the delay. I volunteered that I had taken some measurements over the weekend that isolated the bottleneck to a single method. It turned out there was a missing record on a local lookup, causing the application to have to go to a remote server on every transaction. By adding the record back to the local store, the delay was eliminated - problem solved. Note the re-architecture wouldn't have fixed the problem.
Checking before EVERY javascript operation whether the object you are operating upon exists.
if (myObj) { //or its evil cousin, if (myObj != null) {
label.text = myObj.value;
// we know label exists because it has already been
// checked in a big if block somewhere at the top
My problem with this type of code is nobody seems to care what if it doesn't exist? Just do nothing? Don't give the feedback to the user?
I agree that the Object expected errors are annoying, but this is not the best solution for that.
How about YAGNI extremism. It is a form of premature pessimization. It seems like anytime you apply YAGNI, then you end up needing it, resulting in 10 times the effort to add it than if you had added it in the beginning. If you create a successful program then odds are YOU ARE GOING TO NEED IT. If you are used to creating programs whose life runs out quickly then continue to practice YAGNI because then I suppose YAGNI.
Not exactly premature optimisation - but certainly misguided - this was read on the BBC website, from an article discussing Windows 7.
Mr Curran said that the Microsoft Windows team had been poring over every aspect of the operating system to make improvements.
"We were able to shave 400 milliseconds off the shutdown time by slightly trimming the WAV file shutdown music.
Now, I haven't tried Windows 7 yet, so I might be wrong, but I'm willing to bet that there are other issues in there that are more important than how long it takes to shut-down. After all, once I see the 'Shutting down Windows' message, the monitor is turned off and I'm walking away - how does that 400 milliseconds benefit me?
Someone in my department once wrote a string class. An interface like CString, but without the Windows dependence.
One "optimization" they did was to not allocate any more memory than necessary. Apparently not realizing that the reason classes like std::string do allocate excess memory is so that a sequence of += operations can run in O(n) time.
Instead, every single += call forced a reallocation, which turned repeated appends into an O(n²) Schlemiel the Painter's algorithm.
An ex-coworker of mine (a s.o.a.b., actually) was assigned to build a new module for our Java ERP that should have collected and analyzed customers' data (retail industry). He decided to split EVERY Calendar/Datetime field in its components (seconds, minutes, hours, day, month, year, day of week, bimester, trimester (!)) because "how else would I query for 'every monday'?"
No offense to anyone, but I just graded an assignment (java) that had this
import java.lang.*;
