What is the most ridiculous pessimization you've seen? [closed] - performance
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
We all know that premature optimization is the root of all evil because it leads to unreadable/unmaintainable code. Even worse is pessimization, when someone implements an "optimization" because they think it will be faster, but it ends up being slower, as well as being buggy, unmaintainable, etc. What is the most ridiculous example of this that you've seen?
I think the phrase "premature optimization is the root of all evil" is way, way over used. For many projects, it has become an excuse not to take performance into account until late in a project.
This phrase is often a crutch for people to avoid work. I see this phrase used when people should really say "Gee, we really didn't think of that up front and don't have time to deal with it now".
I've seen many more "ridiculous" examples of dumb performance problems than examples of problems introduced due to "pessimization"
Reading the same registry key thousands (or 10's of thousands) of times during program launch.
Loading the same DLL hundreds or thousands of times
Wasting mega bytes of memory by keeping full paths to files needlessly
Not organizing data structures so they take up way more memory than they need
Sizing all strings that store file names or paths to MAX_PATH
Gratuitous polling for thing that have events, callbacks or other notification mechanisms
What I think is a better statement is this: "optimization without measuring and understanding isn't optimization at all - its just random change".
Good Performance work is time consuming - often more so that the development of the feature or component itself.
Databases are pessimization playland.
Favorites include:
Split a table into multiples (by date range, alphabetic range, etc.) because it's "too big".
Create an archive table for retired records, but continue to UNION it with the production table.
Duplicate entire databases by (division/customer/product/etc.)
Resist adding columns to an index because it makes it too big.
Create lots of summary tables because recalculating from raw data is too slow.
Create columns with subfields to save space.
Denormalize into fields-as-an-array.
That's off the top of my head.
I think there is no absolute rule: some things are best optimized upfront, and some are not.
For example, I worked in a company where we received data packets from satellites. Each packet cost a lot of money, so all the data was highly optimized (ie. packed). For example, latitude/longitude was not sent as absolute values (floats), but as offsets relative to the "north-west" corner of a "current" zone. We had to unpack all the data before it could be used. But I think this is not pessimization, it is intelligent optimization to reduce communication costs.
On the other hand, our software architects decided that the unpacked data should be formatted into a very readable XML document, and stored in our database as such (as opposed to having each field stored in a corresponding column). Their idea was that "XML is the future", "disk space is cheap", and "processor is cheap", so there was no need to optimize anything. The result was that our 16-bytes packets were turned into 2kB documents stored in one column, and for even simple queries we had to load megabytes of XML documents in memory! We received over 50 packets per second, so you can imagine how horrible the performance became (BTW, the company went bankrupt).
So again, there is no absolute rule. Yes, sometimes optimization too early is a mistake. But sometimes the "cpu/disk space/memory is cheap" motto is the real root of all evil.
On an old project we inherited some (otherwise excellent) embedded systems programmers who had massive Z-8000 experience.
Our new environment was 32-bit Sparc Solaris.
One of the guys went and changed all ints to shorts to speed up our code, since grabbing 16 bits from RAM was quicker than grabbing 32 bits.
I had to write a demo program to show that grabbing 32-bit values on a 32-bit system was faster than grabbing 16-bit values, and explain that to grab a 16-bit value the CPU had to make a 32-bit wide memory access and then mask out or shift the bits not needed for the 16-bit value.
Oh good Lord, I think I have seen them all. More often than not it is an effort to fix performance problems by someone that is too darn lazy to troubleshoot their way down to the CAUSE of those performance problems or even researching whether there actually IS a performance problem. In many of these cases I wonder if it isn't just a case of that person wanting to try a particular technology and desperately looking for a nail that fits their shiny new hammer.
Here's a recent example:
Data architect comes to me with an elaborate proposal to vertically partition a key table in a fairly large and complex application. He wants to know what type of development effort would be necessary to adjust for the change. The conversation went like this:
Me: Why are you considering this? What is the problem you are trying to solve?
Him: Table X is too wide, we are partitioning it for performance reasons.
Me: What makes you think it is too wide?
Him: The consultant said that is way too many columns to have in one table.
Me: And this is affecting performance?
Him: Yes, users have reported intermittent slowdowns in the XYZ module of the application.
Me: How do you know the width of the table is the source of the problem?
Him: That is the key table used by the XYZ module, and it is like 200 columns. It must be the problem.
Me (Explaining): But module XYZ in particular uses most of the columns in that table, and the columns it uses are unpredictable because the user configures the app to show the data they want to display from that table. It is likely that 95% of the time we'd wind up joining all the tables back together anyway which would hurt performance.
Him: The consultant said it is too wide and we need to change it.
Me: Who is this consultant? I didn't know we hired a consultant, nor did they talk to the development team at all.
Him: Well, we haven't hired them yet. This is part of a proposal they offered, but they insisted we needed to re-architect this database.
Me: Uh huh. So the consultant who sells database re-design services thinks we need a database re-design....
The conversation went on and on like this. Afterward, I took another look at the table in question and determined that it probably could be narrowed with some simple normalization with no need for exotic partitioning strategies. This, of course turned out to be a moot point once I investigated the performance problems (previously unreported) and tracked them down to two factors:
Missing indexes on a few key
columns.
A few rogue data analysts who were periodically
locking key tables
(including the "too-wide" one)
by querying the
production database directly with
MSAccess.
Of course the architect is still pushing for a vertical partitioning of the table hanging on to the "too wide" meta-problem. He even bolstered his case by getting a proposal from another database consultant who was able to determine we needed major design changes to the database without looking at the app or running any performance analysis.
I have seen people using alphadrive-7 to totally incubate CHX-LT. This is an uncommon practice. The more common practice is to initialize the ZT transformer so that bufferication is reduced (due to greater net overload resistance) and create java style bytegraphications.
Totally pessimistic!
Nothing Earth-shattering, I admit, but I've caught people using StringBuffer to concatenate Strings outside of a loop in Java. It was something simple like turning
String msg = "Count = " + count + " of " + total + ".";
into
StringBuffer sb = new StringBuffer("Count = ");
sb.append(count);
sb.append(" of ");
sb.append(total);
sb.append(".");
String msg = sb.toString();
It used to be quite common practice to use the technique in a loop, because it was measurably faster. The thing is, StringBuffer is synchronized, so there's actually extra overhead if you're only concatenating a few Strings. (Not to mention that the difference is absolutely trivial on this scale.) Two other points about this practice:
StringBuilder is unsynchronized, so should be preferred over StringBuffer in cases where your code can't be called from multiple threads.
Modern Java compilers will turn readable String concatenation into optimized bytecode for you when it's appropriate anyway.
I once saw a MSSQL database that used a 'Root' table. The Root table had four columns: GUID (uniqueidentifier), ID (int), LastModDate (datetime), and CreateDate (datetime). All tables in the database were Foreign Key'd to the Root table. Whenever a new row was created in any table in the db, you had to use a couple of stored procedures to insert an entry in the Root table before you could get to the actual table you cared about (rather than the database doing the job for you with a few triggers simple triggers).
This created a mess of useless overheard and headaches, required anything written on top of it to use sprocs (and eliminating my hopes of introducing LINQ to the company. It was possible but just not worth the headache), and to top it off didn't even accomplish what it was supposed to do.
The developer that chose this path defended it under the assumption that this saved tons of space because we weren't using Guids on the tables themselves (but...isn't a GUID generated in the Root table for every row we make?), improved performance somehow, and made it "easy" to audit changes to the database.
Oh, and the database diagram looked like a mutant spider from hell.
How about POBI -- pessimization obviously by intent?
Collegue of mine in the 90s was tired of getting kicked in the ass by the CEO just because the CEO spent the first day of every ERP software (a custom one) release with locating performance issues in the new functionalities. Even if the new functionalities crunched gigabytes and made the impossible possible, he always found some detail, or even seemingly major issue, to whine upon. He believed to know a lot about programming and got his kicks by kicking programmer asses.
Due to the incompetent nature of the criticism (he was a CEO, not an IT guy), my collegue never managed to get it right. If you do not have a performance problem, you cannot eliminate it...
Until for one release, he put a lot of Delay (200) function calls (it was Delphi) into the new code.
It took just 20 minutes after go-live, and he was ordered to appear in the CEO's office to fetch his overdue insults in person.
Only unusual thing so far was my collegues mute when he returned, smiling, joking, going out for a BigMac or two while he normally would kick tables, flame about the CEO and the company, and spend the rest of the day turned down to death.
Naturally, my collegue now rested for one or two days at his desk, improving his aiming skills in Quake -- then on the second or third day he deleted the Delay calls, rebuilt and released an "emergency patch" of which he spread the word that he had spent 2 days and 1 night to fix the performance holes.
This was the first (and only) time that evil CEO said "great job!" to him. That's all that counts, right?
This was real POBI.
But it also is a kind of social process optimization, so it's 100% ok.
I think.
"Database Independence". This meant no stored procs, triggers, etc - not even any foreign keys.
var stringBuilder = new StringBuilder();
stringBuilder.Append(myObj.a + myObj.b + myObj.c + myObj.d);
string cat = stringBuilder.ToString();
Best use of a StringBuilder I've ever seen.
Using a regex to split a string when a simple string.split suffices
Very late to this thread I know, but I saw this recently:
bool isFinished = GetIsFinished();
switch (isFinished)
{
case true:
DoFinish();
break;
case false:
DoNextStep();
break;
default:
DoNextStep();
}
Y'know, just in case a boolean had some extra values...
Worst example I can think of is an internal database at my company containing information on all employees. It gets a nightly update from HR and has an ASP.NET web service on top. Many other apps use the web service to populate things like search/dropdown fields.
The pessimism is that the developer thought that repeated calls to the web service would be too slow to make repeated SQL queries. So what did he do? The application start event reads in the entire database and converts it all to objects in memory, stored indefinitely until the app pool is recycled. This code was so slow, it would take 15 minutes to load in less than 2000 employees. If you inadvertently recycled the app pool during the day, it could take 30 minutes or more, because each web service request would start multiple concurrent reloads. For this reason, new hires wouldn't appear in the database the first day when their account was created and therefore would not be able to access most internal apps on their first couple days, twiddling their thumbs.
The second level of pessimism is that the development manager doesn't want to touch it for fear of breaking dependent applications, but yet we continue to have sporadic company-wide outages of critical applications due to poor design of such a simple component.
No one seems to have mentioned sorting, so I will.
Several different times, I've discovered that someone had hand-crafted a bubblesort, because the situation "didn't require" a call to the "too fancy" quicksort algorithm that already existed. The developer was satisified when their handcrafted bubblesort worked well enough on the ten rows of data that they're using for testing. It didn't go over quite as well after the customer had added a couple of thousand rows.
I once worked on an app that was full of code like this:
1 tuple *FindTuple( DataSet *set, int target ) {
2 tuple *found = null;
3 tuple *curr = GetFirstTupleOfSet(set);
4 while (curr) {
5 if (curr->id == target)
6 found = curr;
7 curr = GetNextTuple(curr);
8 }
9 return found;
10 }
Simply removing found, returning null at the end, and changing the sixth line to:
return curr;
Doubled the app performance.
I once had to attempt to modify code that included these gems in the Constants class
public static String COMMA_DELIMINATOR=",";
public static String COMMA_SPACE_DELIMINATOR=", ";
public static String COLIN_DELIMINATOR=":";
Each of these were used multiple times in the rest of the application for different purposes. COMMA_DELIMINATOR littered the code with over 200 uses in 8 different packages.
The big all time number one which I run into time and time again in inhouse software:
Not using the features of the DBMS for "portability" reasons because "we might want to switch to another vendor later".
Read my lips. For any inhouse work: IT WILL NOT HAPPEN!
I had a co-worker who was trying to outwit our C compiler's optimizer and routine rewrote code that only he could read. One of his favorite tricks was changing a readable method like (making up some code):
int some_method(int input1, int input2) {
int x;
if (input1 == -1) {
return 0;
}
if (input1 == input2) {
return input1;
}
... a long expression here ...
return x;
}
into this:
int some_method() {
return (input == -1) ? 0 : (input1 == input2) ? input 1 :
... a long expression ...
... a long expression ...
... a long expression ...
}
That is, the first line of a once-readable method would become "return" and all other logic would be replace by deeply nested terniary expressions. When you tried to argue about how this was unmaintainable, he would point to the fact that the assembly output of his method was three or four assembly instructions shorter. It wasn't necessarily any faster but it was always a tiny bit shorter. This was an embedded system where memory usage occasionally did matter, but there were far easier optimizations that could have been made than this that would have left the code readable.
Then, after this, for some reason he decided that ptr->structElement was too unreadable, so he started changing all of these into (*ptr).structElement on the theory that it was more readable and faster as well.
Turning readable code into unreadable code for at the most a 1% improvement, and sometimes actually slower code.
In one of my first jobs as a full-fledged developer, I took over a project for a program that was suffering scaling issues. It would work reasonably well on small data sets, but would completely crash when given large quantities of data.
As I dug in, I found that the original programmer sought to speed things up by parallelizing the analysis - launching a new thread for each additional data source. However, he'd made a mistake in that all threads required a shared resource, on which they were deadlocking. Of course, all benefits of concurrency disappeared. Moreover it crashed most systems to launch 100+ threads only to have all but one of them lock. My beefy dev machine was an exception in that it churned through a 150-source dataset in around 6 hours.
So to fix it, I removed the multi-threading components and cleaned up the I/O. With no other changes, execution time on the 150-source dataset dropped below 10 minutes on my machine, and from infinity to under half an hour on the average company machine.
I suppose I could offer this gem:
unsigned long isqrt(unsigned long value)
{
unsigned long tmp = 1, root = 0;
#define ISQRT_INNER(shift) \
{ \
if (value >= (tmp = ((root << 1) + (1 << (shift))) << (shift))) \
{ \
root += 1 << shift; \
value -= tmp; \
} \
}
// Find out how many bytes our value uses
// so we don't do any uneeded work.
if (value & 0xffff0000)
{
if ((value & 0xff000000) == 0)
tmp = 3;
else
tmp = 4;
}
else if (value & 0x0000ff00)
tmp = 2;
switch (tmp)
{
case 4:
ISQRT_INNER(15);
ISQRT_INNER(14);
ISQRT_INNER(13);
ISQRT_INNER(12);
case 3:
ISQRT_INNER(11);
ISQRT_INNER(10);
ISQRT_INNER( 9);
ISQRT_INNER( 8);
case 2:
ISQRT_INNER( 7);
ISQRT_INNER( 6);
ISQRT_INNER( 5);
ISQRT_INNER( 4);
case 1:
ISQRT_INNER( 3);
ISQRT_INNER( 2);
ISQRT_INNER( 1);
ISQRT_INNER( 0);
}
#undef ISQRT_INNER
return root;
}
Since the square-root was calculated at a very sensitive place, I got the task of looking into a way to make it faster. This small refactoring reduced the execution time by a third (for the combination of hardware and compiler used, YMMV):
unsigned long isqrt(unsigned long value)
{
unsigned long tmp = 1, root = 0;
#define ISQRT_INNER(shift) \
{ \
if (value >= (tmp = ((root << 1) + (1 << (shift))) << (shift))) \
{ \
root += 1 << shift; \
value -= tmp; \
} \
}
ISQRT_INNER (15);
ISQRT_INNER (14);
ISQRT_INNER (13);
ISQRT_INNER (12);
ISQRT_INNER (11);
ISQRT_INNER (10);
ISQRT_INNER ( 9);
ISQRT_INNER ( 8);
ISQRT_INNER ( 7);
ISQRT_INNER ( 6);
ISQRT_INNER ( 5);
ISQRT_INNER ( 4);
ISQRT_INNER ( 3);
ISQRT_INNER ( 2);
ISQRT_INNER ( 1);
ISQRT_INNER ( 0);
#undef ISQRT_INNER
return root;
}
Of course there are both faster AND better ways to do this, but I think it's a pretty neat example of a pessimization.
Edit: Come to think of it, the unrolled loop was actually also a neat pessimization. Digging though the version control, I can present the second stage of refactoring as well, which performed even better than the above:
unsigned long isqrt(unsigned long value)
{
unsigned long tmp = 1 << 30, root = 0;
while (tmp != 0)
{
if (value >= root + tmp) {
value -= root + tmp;
root += tmp << 1;
}
root >>= 1;
tmp >>= 2;
}
return root;
}
This is exactly the same algorithm, albeit a slightly different implementation, so I suppose it qualifies.
This might be at a higher level that what you were after, but fixing it (if you're allowed) also involves a higher level of pain:
Insisting on hand rolling an Object Relationship Manager / Data Access Layer instead of using one of the established, tested, mature libraries out there (even after they've been pointed out to you).
All foreign-key constraints were removed from a database, because otherwise there would be so many errors.
This doesn't exactly fit the question, but I'll mention it anyway a cautionary tale. I was working on a distributed app that was running slowly, and flew down to DC to sit in on a meeting primarily aimed at solving the problem. The project lead started to outline a re-architecture aimed at resolving the delay. I volunteered that I had taken some measurements over the weekend that isolated the bottleneck to a single method. It turned out there was a missing record on a local lookup, causing the application to have to go to a remote server on every transaction. By adding the record back to the local store, the delay was eliminated - problem solved. Note the re-architecture wouldn't have fixed the problem.
Checking before EVERY javascript operation whether the object you are operating upon exists.
if (myObj) { //or its evil cousin, if (myObj != null) {
label.text = myObj.value;
// we know label exists because it has already been
// checked in a big if block somewhere at the top
}
My problem with this type of code is nobody seems to care what if it doesn't exist? Just do nothing? Don't give the feedback to the user?
I agree that the Object expected errors are annoying, but this is not the best solution for that.
How about YAGNI extremism. It is a form of premature pessimization. It seems like anytime you apply YAGNI, then you end up needing it, resulting in 10 times the effort to add it than if you had added it in the beginning. If you create a successful program then odds are YOU ARE GOING TO NEED IT. If you are used to creating programs whose life runs out quickly then continue to practice YAGNI because then I suppose YAGNI.
Not exactly premature optimisation - but certainly misguided - this was read on the BBC website, from an article discussing Windows 7.
Mr Curran said that the Microsoft Windows team had been poring over every aspect of the operating system to make improvements.
"We were able to shave 400 milliseconds off the shutdown time by slightly trimming the WAV file shutdown music.
Now, I haven't tried Windows 7 yet, so I might be wrong, but I'm willing to bet that there are other issues in there that are more important than how long it takes to shut-down. After all, once I see the 'Shutting down Windows' message, the monitor is turned off and I'm walking away - how does that 400 milliseconds benefit me?
Someone in my department once wrote a string class. An interface like CString, but without the Windows dependence.
One "optimization" they did was to not allocate any more memory than necessary. Apparently not realizing that the reason classes like std::string do allocate excess memory is so that a sequence of += operations can run in O(n) time.
Instead, every single += call forced a reallocation, which turned repeated appends into an O(n²) Schlemiel the Painter's algorithm.
An ex-coworker of mine (a s.o.a.b., actually) was assigned to build a new module for our Java ERP that should have collected and analyzed customers' data (retail industry). He decided to split EVERY Calendar/Datetime field in its components (seconds, minutes, hours, day, month, year, day of week, bimester, trimester (!)) because "how else would I query for 'every monday'?"
No offense to anyone, but I just graded an assignment (java) that had this
import java.lang.*;
Related
Recreating bugs in cocos2d iphone
I guess someone must have asked a similar question before, but here goes. It would be useful to be able to record games so that if a bug happened during the game, the recorded play can be reused later with a fixed build to confirm if the bug is fixed or not. I am using box2d as well and from what I remember it seems as if box2d is not really deterministic, but at least being able to recreate most of the state from the first time would be OK in many cases. Recreating the same randomized values would take reinstating the same time etc I assume. Any insight? I have been fiddling with calabash-ios with various success. I know it's possible to record plays, and playback them there later. I just assume it wouldn't recreate random values. A quick look at box2d faq and I think box2d is deterministic enough For the same input, and same binary, Box2D will reproduce any simulation. Box2D does not use any random numbers nor base any computation on random events (such as timers, etc). However, people often want more stringent determinism. People often want to know if Box2D can produce identical results on different binaries and on different platforms. The answer is no. The reason for this answer has to do with how floating point math is implemented in many compilers and processors. I recommend reading this article if you are curious: http://www.yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html
If you encapsulate the input state the player gives to the world each time step (eg. in a POD struct) then it's pretty straightforward to write that to a file. For example, suppose you have input state like: struct inputStruct { bool someButtonPressed; bool someOtherKeyPressed; float accelerometerZ; ... etc }; Then you can do something like this each time step: inputStruct currentState; currentState.someButtonPressed = ...; // set contents from live user input if ( recording ) fwrite( ¤tState, sizeof(inputStruct), 1, file ); else if ( replaying ) { inputStruct tmpState; int readCount = fread( &tmpState, sizeof(inputStruct), 1, file ); if ( readCount == 1 ) currentState = tmpState; //overwrite live input } applyState( currentState ); // apply forces, game logic from input world->Step( ... ); // step the Box2D world Please excuse the C++ centric code :~) No doubt there are equivalent ways to do it with Objective-C. This method lets you regain live control when the input from the file runs out. 'file' is a FILE* that you would have to open in the appropriate mode (rb or wb) when the level was loaded. If the bug you're chasing causes a crash, you might need to fflush after writing to make sure the input state actually gets written before crashing. As you have noted, this is highly unlikely to work across different platforms. You should not assume that the replay file will reproduce the same result on anything other than the device that recorded it (which should be fine for debugging purposes). As for random values, you'll need to ensure that anything using random values that may affect the Box2D world go through a deterministic random generator which is not shared with other code, and you'll need to record the seed that was used for each replay. You might like to use one of the many implementations of Mersenne Twister found at http://en.wikipedia.org/wiki/Mersenne_twister When I say 'not shared', suppose you also use the MT algorithm to generate random directions for particles, purely for rendering purposes - you would not want to use the same generator instance for that as you do for physics-related randomizations.
What's the most efficient way to ignore code in lua?
I have a chunk of lua code that I'd like to be able to (selectively) ignore. I don't have the option of not reading it in and sometimes I'd like it to be processed, sometimes not, so I can't just comment it out (that is, there's a whole bunch of blocks of code and I either have the option of reading none of them or reading all of them). I came up with two ways to implement this (there may well be more - I'm very much a beginner): either enclose the code in a function and then call or not call the function (and once I'm sure I'm passed the point where I would call the function, I can set it to nil to free up the memory) or enclose the code in an if ... end block. The former has slight advantages in that there are several of these blocks and using the former method makes it easier for one block to load another even if the main program didn't request it, but the latter seems the more efficient. However, not knowing much, I don't know if the efficiency saving is worth it. So how much more efficient is: if false then -- a few hundred lines end than throwaway = function () -- a few hundred lines end throwaway = nil -- to ensure that both methods leave me in the same state after garbage collection ? If it depends a lot on the lua implementation, how big would the "few hundred lines" need to be to reliably spot the difference, and what sort of stuff should it include to best test (the main use of the blocks is to define a load of possibly useful functions)?
Lua's not smart enough to dump the code for the function, so you're not going to save any memory. In terms of speed, you're talking about a different of nanoseconds which happens once per program execution. It's harming your efficiency to worry about this, which has virtually no relevance to actual performance. Write the code that you feel expresses your intent most clearly, without trying to be clever. If you run into performance issues, it's going to be a million miles away from this decision. If you want to save memory, which is understandable on a mobile platform, you could put your conditional code in it's own module and never load it at all of not needed (if your framework supports it; e.g. MOAI does, Corona doesn't).
If there is really a lot of unused code, you can define it as a collection of Strings and loadstring() it when needed. Storing functions as strings will reduce the initial compile time, however of most functions the string representation probably takes up more memory than it's compiled form and what you save when compiling is probably not significant before a few thousand lines... Just saying. If you put this code in a table, you could compile it transparently through a metatable for minimal performance impact on repeated calls. Example code local code_uncompiled = { f = [=[ local x, y = ...; return x+y; ]=] } code = setmetatable({}, { __index = function(self, k) self[k] = assert(loadstring(code_uncompiled[k])); return self[k]; end }); local ff = code.f; -- code of x gets compiled here ff = code.f; -- no compilation here for i=1, 1000 do print( ff(2*i, -i) ); -- no compilation here either print( code.f(2*i, -i) ); -- no compile either, but table access (slower) end The beauty of it is that this compiles as needed and you don't really have to waste another thought on it, it's just like storing a function in a table and allows for a lot of flexibility. Another advantage of this solution is that when the amount of dynamically loaded code gets out of hand, you could transparently change it to load code from external files on demand through the __index function of the metatable. Also, you can mix compiled and uncompiled code by populating the "code" table with "real" functions.
Try the one that makes the code more legible to you first. If it runs fast enough on your target machine, use that. If it doesn't run fast enough, try the other one.
lua can ignore multiple lines by: function dostuff() blabla faaaaa --[[ ignore this and this maybe this this as well ]]-- end
Improving the performance of System.String to std::wstring conversions?
I'm currently evaluating the use of ADO.NET for a C++ application that currently uses plain old ADO. Given that we're redoing the whole database interaction, we'd like to determine if using the more modern and actively developed technology of ADO.NET would be beneficial. After some measurements it appears that for certain test queries that retrieve a lot of rows with few columns that all contain strings, ADO.NET is actually about 20% slower for us than using plain ADO. Our profiler suggests that the conversion of System.String results into the std::wstring used by the application is one of the bottlenecks. I can't switch any of the upper layers of the application to using System.String, so we are stuck with this particular conversion. A rough outline of the code looks like this: System::Data::SqlClient::SqlCommand^ sqlCmd = gcnew System::Data::SqlClient::SqlCommand(cmd, m_DBConnection.get()); System::Data::SqlClient::SqlDataReader^ reader = sqlCmd->ExecuteReader(); if (reader->HasRows) { using namespace msclr::interop; while (reader->Read()) { std::vector<std::wstring> results; for (int i=0; i < reader->FieldCount; ++i) { std::wstring col_data; TypeCode type = Type::GetTypeCode(reader->GetFieldType(i)); switch (type) { // ... omit lots of different types case TypeCode::String: { System::String^ tmp = reader->GetString(i); col_data = marshal_as<std::wstring>(tmp); } break; // ... more type conversion code removed } results.push_back(col_data); } // NOTE: Callback into native result processing code ResultsCallback(results); } I've spent a lot of time reading up on the various ways of getting a std::wstring out of the System.String and measured most of them. They all seem to perform roughly similar - we're talking decimal points in the percentage of CPU usage. In the end I simply settled for using marshal_as<std::wstring> as it's the most readable and appears to be as performant as the other solutions (ie, using PtrToStringChars or the method described in MSDN here). Using the DataReader works very well from a conceptual point of view as most of the processing we do on the data is row oriented anyway. The only other slightly unexpected bottleneck I noticed is the retrieval of the TypeCode for the results columns; I'm already planning to move that outside the main results processing loop and only retrieve the type codes once per query result. After this lengthy introduction, can anybody recommend a less costly way to convert the string data from a System.String to a std::wstring or am I already looking at the optimum performance here? I'm obviously more looking for slightly out of the ordinary ways given that I've already tried all the ordinary ones... EDIT: Looks like I fell into a trap of my own making here. Yes, the code above is about 20% slower than the equivalent plain ADO code in Debug mode. However switching it into Release mode, the bottleneck is still measurable but the ADO.NET code above is suddenly almost 50% faster than the older ADO code. So while I'm still concerned a little about the cost of the string conversion, it's not as big in Release mode as it first appeared.
I don't see there being any way to optimize that, since the implementation of marshal_as<std::wstring> just grabs the internal C string and assigns it to an std::wstring. You can't get much more efficient than that. The only solution I can see is splitting up your rows and having N threads process them in parallel. The only issue is that you would need to reserve enough space in your vector to prevent a resize from taking place during processing, but that looks easy enough. If you're using Visual Studio 2010, I think the C++0x threading library would be sufficient for this task, though I'm not sure how much (if any) is implemented in Visual Studio so far.
Algorithm to validate instruction execution order
I'd like to write some simple code that helps to determine if some instructions have been executed in the intended order client-side. This is to make things difficult for anyone wishing to alter behaviour by editing byte code. For example, using a JMP so some instructions are never executed. I'm a bit short on ideas though. To check if the last two instructions have been run in the correct order something simple like this could be used (pseudo code): // Variables initialized by server int lastInt; // Monitored at regular intervals // Saves using callback which could be tampered with boolean bSomethingFishyHere; int array[20] ... execute( array[5], doStuff1() ) execute( array[6], doStuff2() ) ... // This could be tested remotely with all combinations of values possible execute( int i, boolean b ){ if( lastInt >= i ){ bSomethingFishyHere = true; } lastInt = i; } I'm at a loss at to what approach could be used to verify if all instructions have been run in the correct order though. Maybe I could add an array and have it populated by the server with some randomly ascending numbers or use some sort checksum. What are your suggestions?
The problem is, that no matter what kind of book-keeping you do, a malicious user can always do the same book-keeping, but skip over the actual doing stuff. If you can do it, so can they. You can rely on external mechanisms, like code-signing to ensure that your executable hasn't been tampered with and CPU protections to prevent on-the-fly modification of the code in memory. But in that case you're only as secure as the platform you're running on. I'm assuming this is some sort of copy-protection scheme. (If not, feel free to correct me, and you might get some better, more applicable advice). There isn't a fool-proof way to prevent someone from running your software, but you can license an existing scheme where the vendor has already put enough effort into it, that it's not worth it for an attacker to bother, for the most part. The one way that is pretty much fool-proof, is if you control the code. Run the real meat of the code on your servers, and provide some sort of front end remote client.
This is just to patch some holes in an fps shooter. The designers of the game left some temporary variables that can be changed in the console. Some of them are harmless but others like Texture transparent=true are abusive. What I'm aiming for is to redesign an existing modification so most of the code is on the server as you suggest. The variables in question are set in the "world" that is mimicked by the client. Ultimately, I'm planning to extend some classes so they ignore them and just need to monitor values where this isn't possible. If you do want a short-term patch, a more practical approach (than the one you are looking at) would to send encrypted bytecodes to the client and using a special classloader to decrypt them on the fly. Beware however that it wouldn't be that difficult for a hacker to reverse engineer the classloader, get hold of the client-side bytecodes, and modify them to install the cheats. So my advice is that any client-side "patch" to stop users tampering with the bytecodes is never going to be hack-proof. Skip that idea, and go straight to your long term solution of rearchitecting the game so that it is not necessary to trust that the client-side code plays by the rules.
How does differential execution work?
I've seen a few mentions of this on Stack Overflow, but staring at Wikipedia (the relevant page has since been deleted) and at an MFC dynamic dialog demo did nothing to enlighten me. Can someone please explain this? Learning a fundamentally different concept sounds nice. Based on the answers: I think I'm getting a better feel for it. I guess I just didn't look at the source code carefully enough the first time. I have mixed feelings about differential execution at this point. On the one hand, it can make certain tasks considerably easier. On the other hand, getting it up and running (that is, setting it up in your language of choice) is not easy (I'm sure it would be if I understood it better)...though I guess the toolbox for it need only be made once, then expanded as necessary. I think in order to really understand it, I'll probably need to try implementing it in another language.
Gee, Brian, I wish I had seen your question sooner. Since it's pretty much my "invention" (for better or worse), I might be able to help. Inserted: The shortest possible explanation I can make is that if normal execution is like throwing a ball in the air and catching it, then differential execution is like juggling. #windfinder's explanation is different from mine, and that's OK. This technique is not easy to wrap one's head around, and it's taken me some 20 years (off and on) to find explanations that work. Let me give it another shot here: What is it? We all understand the simple idea of a computer stepping along through a program, taking conditional branches based on the input data, and doing things. (Assume we are dealing only with simple structured goto-less, return-less code.) That code contains sequences of statements, basic structured conditionals, simple loops, and subroutine calls. (Forget about functions returning values for now.) Now imagine two computers executing that same code in lock-step with each other, and able to compare notes. Computer 1 runs with input data A, and Computer 2 runs with input data B. They run step-by-step side by side. If they come to a conditional statement like IF(test) .... ENDIF, and if they have a difference of opinion on whether the test is true, then the one who says the test if false skips to the ENDIF and waits around for its sister to catch up. (This is why the code is structured, so we know the sister will eventually get to the ENDIF.) Since the two computers can talk to each other, they can compare notes and give a detailed explanation of how the two sets of input data, and execution histories, are different. Of course, in differential execution (DE) it is done with one computer, simulating two. NOW, suppose you only have one set of input data, but you want to see how it has changed from time 1 to time 2. Suppose the program you're executing is a serializer/deserializer. As you execute, you both serialize (write out) the current data and deserialize (read in) the past data (which was written the last time you did this). Now you can easily see what the differences are between what the data was last time, and what it is this time. The file you are writing to, and the old file you are reading from, taken together constitute a queue or FIFO (first-in-first-out), but that's not a very deep concept. What is it good for? It occurred to me while I was working on a graphics project, where the user could construct little display-processor routines called "symbols" that could be assembled into larger routines to paint things like diagrams of pipes, tanks, valves, stuff like that. We wanted to have the diagrams be "dynamic" in the sense that they could incrementally update themselves without having to redraw the entire diagram. (The hardware was slow by today's standards.) I realized that (for example) a routine to draw a bar of a bar-chart could remember its old height and just incrementally update itself. This sounds like OOP, doesn't it? However, rather than "make" an "object", I could take advantage of the predictability of the execution sequence of the diagram procedure. I could write the bar's height in a sequential byte-stream. Then to update the image, I could just run the procedure in a mode where it sequentially reads its old parameters while it writes the new parameters so as to be ready for the next update pass. This seems stupidly obvious and would seem to break as soon as the procedure contains a conditional, because then the new stream and the old stream would get out of sync. But then it dawned on me that if they also serialized the boolean value of the conditional test, they could get back in sync. It took a while to convince myself, and then to prove, that this would always work, provided a simple rule (the "erase mode rule") is followed. The net result is that the user could design these "dynamic symbols" and assemble them into larger diagrams, without ever having to worry about how they would dynamically update, no matter how complex or structurally variable the display would be. In those days, I did have to worry about interference between visual objects, so that erasing one would not damage others. However, now I use the technique with Windows controls, and I let Windows take care of rendering issues. So what does it achieve? It means I can build a dialog by writing a procedure to paint the controls, and I do not have to worry about actually remembering the control objects or dealing with incrementally updating them, or making them appear/disappear/move as conditions warrant. The result is much smaller and simpler dialog source code, by about an order of magnitude, and things like dynamic layout or altering the number of controls or having arrays or grids of controls are trivial. In addition, a control such as an Edit field can be trivially bound to the application data it is editing, and it will always be provably correct, and I never have to deal with its events. Putting in an edit field for an application string variable is a one-line edit. Why is it hard to understand? What I have found hardest to explain is that it requires thinking differently about software. Programmers are so firmly wedded to the object-action view of software that they want to know what are the objects, what are the classes, how do they "build" the display, and how do they handle the events, that it takes a cherry bomb to blast them out of it. What I try to convey is that what really matters is what do you need to say? Imagine you are building a domain-specific language (DSL) where all you need to do is tell it "I want to edit variable A here, variable B there, and variable C down there" and it would magically take care of it for you. For example, in Win32 there is this "resource language" for defining dialogs. It is a perfectly good DSL, except it doesn't go far enough. It doesn't "live in" the main procedural language, or handle events for you, or contain loops/conditionals/subroutines. But it means well, and Dynamic Dialogs tries to finish the job. So, the different mode of thinking is: to write a program, you first find (or invent) an appropriate DSL, and code as much of your program in that as possible. Let it deal with all the objects and actions that only exist for implementation's sake. If you want to really understand differential execution and use it, there are a couple of tricky issues that can trip you up. I once coded it in Lisp macros, where these tricky bits could be handled for you, but in "normal" languages it requires some programmer discipline to avoid the pitfalls. Sorry to be so long-winded. If I haven't made sense, I'd appreciate it if you'd point it out and I can try and fix it. Added: In Java Swing, there is an example program called TextInputDemo. It is a static dialog, taking 270 lines (not counting the list of 50 states). In Dynamic Dialogs (in MFC) it is about 60 lines: #define NSTATE (sizeof(states)/sizeof(states[0])) CString sStreet; CString sCity; int iState; CString sZip; CString sWholeAddress; void SetAddress(){ CString sTemp = states[iState]; int len = sTemp.GetLength(); sWholeAddress.Format("%s\r\n%s %s %s", sStreet, sCity, sTemp.Mid(len-3, 2), sZip); } void ClearAddress(){ sWholeAddress = sStreet = sCity = sZip = ""; } void CDDDemoDlg::deContentsTextInputDemo(){ int gy0 = P(gy); P(www = Width()*2/3); deStartHorizontal(); deStatic(100, 20, "Street Address:"); deEdit(www - 100, 20, &sStreet); deEndHorizontal(20); deStartHorizontal(); deStatic(100, 20, "City:"); deEdit(www - 100, 20, &sCity); deEndHorizontal(20); deStartHorizontal(); deStatic(100, 20, "State:"); deStatic(www - 100 - 20 - 20, 20, states[iState]); if (deButton(20, 20, "<")){ iState = (iState+NSTATE - 1) % NSTATE; DD_THROW; } if (deButton(20, 20, ">")){ iState = (iState+NSTATE + 1) % NSTATE; DD_THROW; } deEndHorizontal(20); deStartHorizontal(); deStatic(100, 20, "Zip:"); deEdit(www - 100, 20, &sZip); deEndHorizontal(20); deStartHorizontal(); P(gx += 100); if (deButton((www-100)/2, 20, "Set Address")){ SetAddress(); DD_THROW; } if (deButton((www-100)/2, 20, "Clear Address")){ ClearAddress(); DD_THROW; } deEndHorizontal(20); P((gx = www, gy = gy0)); deStatic(P(Width() - gx), 20*5, (sWholeAddress != "" ? sWholeAddress : "No address set.")); } Added: Here's example code to edit an array of hospital patients in about 40 lines of code. Lines 1-6 define the "database". Lines 10-23 define the overall contents of the UI. Lines 30-48 define the controls for editing a single patient's record. Note the form of the program takes almost no notice of events in time, as if all it had to do was create the display once. Then, if subjects are added or removed or other structural changes take place, it is simply re-executed, as if it were being re-created from scratch, except that DE causes incremental update to take place instead. The advantage is that you the programmer do not have to give any attention or write any code to make the incremental updates of the UI happen, and they are guaranteed correct. It might seem that this re-execution would be a performance problem, but it is not, since updating controls that do not need to be changed takes on the order of tens of nanoseconds. 1 class Patient {public: 2 String name; 3 double age; 4 bool smoker; // smoker only relevant if age >= 50 5 }; 6 vector< Patient* > patients; 10 void deContents(){ int i; 11 // First, have a label 12 deLabel(200, 20, “Patient name, age, smoker:”); 13 // For each patient, have a row of controls 14 FOR(i=0, i<patients.Count(), i++) 15 deEditOnePatient( P( patients[i] ) ); 16 END 17 // Have a button to add a patient 18 if (deButton(50, 20, “Add”)){ 19 // When the button is clicked add the patient 20 patients.Add(new Patient); 21 DD_THROW; 22 } 23 } 30 void deEditOnePatient(Patient* p){ 31 // Determine field widths 32 int w = (Width()-50)/3; 33 // Controls are laid out horizontally 34 deStartHorizontal(); 35 // Have a button to remove this patient 36 if (deButton(50, 20, “Remove”)){ 37 patients.Remove(p); 37 DD_THROW; 39 } 40 // Edit fields for name and age 41 deEdit(w, 20, P(&p->name)); 42 deEdit(w, 20, P(&p->age)); 43 // If age >= 50 have a checkbox for smoker boolean 44 IF(p->age >= 50) 45 deCheckBox(w, 20, “Smoker?”, P(&p->smoker)); 46 END 47 deEndHorizontal(20); 48 } Added: Brian asked a good question, and I thought the answer belonged in the main text here: #Mike: I'm not clear on what the "if (deButton(50, 20, “Add”)){" statement is actually doing. What does the deButton function do? Also, are your FOR/END loops using some sort of macro or something? – Brian. #Brian: Yes, the FOR/END and IF statements are macros. The SourceForge project has a complete implementation. deButton maintains a button control. When any user input action takes place, the code is run in "control event" mode, in which deButton detects that it was pressed and signifies that it was pressed by returning TRUE. Thus, the "if(deButton(...)){... action code ...} is a way of attaching action code to the button, without having to create a closure or write an event handler. The DD_THROW is a way of terminating the pass when the action is taken because the action may have modified application data, so it is invalid to continue the "control event" pass through the routine. If you compare this to writing event handlers, it saves you writing those, and it lets you have any number of controls. Added: Sorry, I should explain what I mean by the word "maintains". When the procedure is first executed (in SHOW mode), deButton creates a button control and remembers its id in the FIFO. On subsequent passes (in UPDATE mode), deButton gets the id from the FIFO, modifies it if necessary, and puts it back in the FIFO. In ERASE mode, it reads it from the FIFO, destroys it, and does not put it back, thereby "garbage collecting" it. So the deButton call manages the entire lifetime of the control, keeping it in agreement with application data, which is why I say it "maintains" it. The fourth mode is EVENT (or CONTROL). When the user types a character or clicks a button, that event is caught and recorded, and then the deContents procedure is executed in EVENT mode. deButton gets the id of its button control from the FIFO and askes if this is the control that was clicked. If it was, it returns TRUE so the action code can be executed. If not, it just returns FALSE. On the other hand, deEdit(..., &myStringVar) detects if the event was meant for it, and if so passes it to the edit control, and then copies the contents of the edit control to myStringVar. Between this and normal UPDATE processing, myStringVar always equals the contents of the edit control. That is how "binding" is done. The same idea applies to scroll bars, list boxes, combo boxes, any kind of control that lets you edit application data. Here's a link to my Wikipedia edit: http://en.wikipedia.org/wiki/User:MikeDunlavey/Difex_Article
Differential execution is a strategy for changing the flow of your code based on external events. This is usually done by manipulating a data structure of some kind to chronicle the changes. This is mostly used in graphical user interfaces, but is also used for things like serialization, where you are merging changes into an existing "state." The basic flow is as follows: Start loop: for each element in the datastructure: if element has changed from oldDatastructure: copy element from datastructure to oldDatastructure execute corresponding subroutine (display the new button in your GUI, for example) End loop: Allow the states of the datastructure to change (such as having the user do some input in the GUI) The advantages of this are a few. One, it is separation of the execution of your changes, and the actual manipulation of the supporting data. Which is nice for multiple processors. Two, it provides a low bandwidth method of communicating changes in your program.
Think of how a monitor works: It is updated at 60 Hz -- 60 times a second. Flicker flicker flicker 60 times, but your eyes are slow and can't really tell. The monitor shows whatever is in the output buffer; it just drags this data out every 1/60th of a second no matter what you do. Now why would you want your program to update the whole buffer 60 times a second if the image shouldn't change that often? What if you only change one pixel of the image, should you rewrite the entire buffer? This is an abstraction of the basic idea: you want to change the output buffer based on what information you want displayed on the screen. You want to save as much CPU time and buffer write time as possible, so you don't edit parts of the buffer that need not be changed for the next screen pull. The monitor is separate from your computer and logic (programs). It reads from the output buffer at whatever rate it updates the screen. We want our computer to stop synchronizing and redrawing unnecessarily. We can solve this by changing how we work with the buffer, which can be done in a variety of ways. His technique implements a FIFO queue that is on delay -- it holds what we just sent to the buffer. The delayed FIFO queue does not hold pixel data, it holds "shape primitives" (which might be pixels in your application, but it could also be lines, rectangles, easy-to-draw things because they are just shapes, no unnecessary data is allowed). So you want to draw/erase things from the screen? No problem. Based on the contents of the FIFO queue I know what the monitor looks like at the moment. I compare my desired output (to erase or draw new primitives) with the FIFO queue and only change values that need to be changed/updated. This is the step which gives it the name Differential Evaluation. Two distinct ways in which I appreciate this: The First: Mike Dunlavey uses a conditional-statement extension. The FIFO queue contains a lot of information (the "previous state" or the current stuff on monitor or time-based polling device). All you have to add to this is the state you want to appear on screen next. A conditional bit is added to every slot that can hold a primitive in the FIFO queue. 0 means erase 1 means draw However, we have previous state: Was 0, now 0: don't do anything; Was 0, now 1: add it to the buffer (draw it); Was 1, now 1: don't do anything; Was 1, now 0: erase it from the buffer (erase it from the screen); This is elegant, because when you update something you really only need to know what primitives you want to draw to the screen -- this comparison will find out if it should erase a primitive or add/keep it to/in the buffer. The Second: This is just one example, and I think that what Mike is really getting at is something that should be fundamental in design for all projects: Reduce the (computational) complexity of design by writing your most computationally intense operations as computerbrain-food or as close as you can get. Respect the natural timing of devices. A redraw method to draw the entire screen is incredibly costly, and there are other applications where this insight is incredibly valuable. We are never "moving" objects around the screen. "Moving" is a costly operation if we are going to mimic the physical action of "moving" when we design code for something like a computer monitor. Instead, objects basically just flicker on and off with the monitor. Every time an object moves, it's now a new set of primitives and the old set of primitives flickers off. Every time the monitor pulls from the buffer we have entries that look like Draw bit primitive_description 0 Rect(0,0,5,5); 1 Circ(0,0,2); 1 Line(0,1,2,5); Never does an object interact with the screen (or time-sensitive polling device). We can handle it more intelligently than an object will when it greedily asks to update the whole screen just to show a change specific to only itself. Say we have a list of all possible graphical primitives our program is capable of generating, and that we tie each primitive to a set of conditional statements if (iWantGreenCircle && iWantBigCircle && iWantOutlineOnMyCircle) ... Of course, this is an abstraction and, really, the set of conditionals that represents a particular primitive being on/off could be large (perhaps hundreds of flags that must all evaluate to true). If we run the program, we can draw to the screen at essentially the same rate at which we can evaluate all these conditionals. (Worst case: how long it takes to evaluate the largest set of conditional statements.) Now, for any state in the program, we can simply evaluate all the conditionals and output to the screen lightning-quick! (We know our shape primitives and their dependent if-statements.) This would be like buying a graphically-intense game. Only instead of installing it to your HDD and running it through your processor, you buy a brand-new board that holds the entirety of the game and takes as input: mouse, keyboard, and takes as output: monitor. Incredibly condensed conditional evaluation (as the most fundamental form of a conditional is logic gates on circuit boards). This would, naturally, be very responsive, but it offers almost no support in fixing bugs, as the whole board design changes when you make a tiny design change (because the "design" is so far-removed from the nature of the circuit board). At the expense of flexibility and clarity in how we represent data internally we have gained significant "responsiveness" because we are no longer doing "thinking" in the computer; it is all just reflex for the circuit board based on the inputs. The lesson, as I understand it, is to divide labor such that you give each part of the system (not necessarily just computer and monitor) something it can do well. The "computer thinking" can be done in terms of concepts like objects... The computer brain will gladly try and think this all through for you, but you can simplify the task a great deal if you are able to let the computer think in terms of data_update and conditional_evals. Our human abstractions of concepts into code are idealistic, and in the case of internal program draw methods a little overly idealistic. When all you want is a result (array of pixels with correct color values) and you have a machine that can easily spit out an array that big every 1/60th of a second, try and eliminate as much flowery thinking from the computer brain as possible so that you can focus on what you really want: to synchronize your graphical updates with your (fast) inputs and the natural behavior of the monitor. How does this map to other applications? I'd like to hear of other examples, but I'm sure there are many. I think anything that provides a real-time "window" into the state of your information (variable state or something like a database... a monitor is just a window into your display buffer) can benefit from these insights.
I find this concept very similar to the state machines of classic digital electronics. Specially the ones which remember their previous output. A machine whose next output depends on current input and previous output according to (YOUR CODE HERE). This current input is nothing but previous output + (USER, INTERACT HERE). Fill up a surface with such machines, and it will be user interactive and at the same time represent a layer of changeable data. But at this stage it will still be dumb, just reflecting user interaction to underlying data. Next, interconnect the machines on your surface, let them share notes, according to (YOUR CODE HERE), and now we make it intelligent. It will become an interactive computing system. So you just have to provide your logic at two places in the above model; the rest is taken care of by the machine design itself. That's what good about it.