Performance Issue - How to display very large text file in j2me - performance

I want to display a very large text file but want to break it so that after every 1024 characters shown (page), the user will have to press next to read the next set of characters while back will take them to the exact same previous characters (if any).
Performance wise, am I to create an array of forms and store each 'page' or should the creation of the form be at the point the user presses next or previous?

Performance wise, none of approaches you consider is guaranteed to be problem-free. MIDP 2 specification (JSR-118) does not enforce any specific performance requirements for cases like that. As a result, different devices may behave differently.
To stay on a safe side, you may implement both approaches and let users choose which one works better on their device.
Another option worth considering is to use uneditable TextBox instead of Form. TextBox primary purpose is to display large chunks of text - because of that one may expect that sensible device manufacturers will provide an implementation optimized for that. TextBox even has a method getMaxSize which will let you know if device "thinks" that 1024 characters are too much for it.

Performance wise ?? That would depend on the size no short cut with j2me .... another option is to split the files to smaller chunks.
Reading InputStream ... Use read with offset and length is more efficient
InputStream is = getClass().getResourceAsStream(file);
is.read(b, 0, 1024);
You can add a comment of you need anything else
Thanks
:)

Related

Maintaining Maximum Line Length When Using Tabs Instead of Spaces?

Style guides for various languages recommend a maximum line length range of 80–100 characters for readability and comprehension.
My question is, how do people using tabs for indentation comply with the line length limits? Because, if the length of a tab is set to 2 in your editor, the same code could be over 80 characters long in someone else's editor with tab length set to 3/4/6/8.
Some suggest that I mentally see my tab length at 8 characters. This isn't realistic and won't let me focus on programming as I'd have to check it for every other line.
So, how do you (if you use only tabs in your code) do it?
Almost all languages have a convention for tab size. Most projects have a convention for tabs size as well, which is usually (though not always) the same as the language's tab size.
For example, Ruby is 2, Python & PHP is 4, C is 8, etc.
Sticking with the project's tab length, or the language tab size if there is no obvious project tab size, is by far the most sane thing to do, since this what almost all people will use, except perhaps a few.
The ability to set a different tab size is probably the largest advantage of using tabs instead of spaces, and if a user wants to deviate from the standard, then that's fine, but there is no reasonable way to also comply with the maximum line length in every possible situation.
You can compare it to most web sites; they're designed for a 100% zoom level. Zooming in or changing the font size almost always works, but some minor things may break. It's practically impossible to design for every possible zoom level & font size, so artefacts are accepted.
If you want to make sure the code complies to both the line length and tab size, you should probably use spaces, and not "real" tabs. Personally, I feel these guidelines are exactly that: guidelines; and not strict rules, so I don't think it matters that much. Other people's opinion differs on this matter, though.
Personally, I like tabs, for a number of reasons which are not important now, but only deviate from the standard temporarily, usually when editing heavy-nested code where I would like some more clarity which block I'm editing.
You probably don't write very long lines often, so you probably won't have to reformat your code too often.
Therefore, you can use an approach where you periodically check your file for compliance, and fix anything that's out of whack (for example, you could do this before submitting the file to code-review).
Here's a trivially simple Python script that takes a file on stdin and tells you which lines exceed 80 chars:
TABSIZE = 8
MAXLENGTH = 80
for i, line in enumerate(sys.stdin):
if len(line.rstrip('\n').replace('\t', ' '*TABSIZE)) > MAXLENGTH:
print "%d: line exceeds %d characters" % (i+1, MAXLENGTH)
You can extend this in various ways: have it take a filename/directory and automatically scan everything in it, make the tab size or length configurable, etc.
Some editors can be set up to insert a number of spaces equal to your tab length instead of an actual tab (Visual Studio has this enabled by default). This allows you to use the tab key to indent while writing code, but also guarantees that the code will look the same when someone else reads it, regardless of what their tab length may be.
Things get a little trickier if the reader also needs to edit the code, as is the case in pretty much every development team. Teams will usually have tabs vs. spaces and tab length as part of their coding standard. Each developer is responsible for making their own commits align with the standard, generally by setting up their environment to do this automatically. Changes that don't meet the standard can be reverted, and the developer made to fix them before they get integrated.
If you are going to have style guidelines that everyone has to follow then one way to insure that all code conforms to a column width maximum of 100 is to use code formatter such as Clang Format on check-in for example they include a option to set the ColumnLimit:
The column limit.
A column limit of 0 means that there is no column limit. In this case, clang-format will respect the input’s line breaking decisions
within statements unless they contradict other rules.
Once your team agrees to the settings then no one has to think about these details on check-in code formatting will be automatically done. This requires a degree of compromise but once in place you stop thinking about pretty quickly and you just worry about your code rather then formatting.
If you want to ensure you don't go over the limit as prescribed in a team/project coding standard, then it would be irrational to not also have a teamp/project standard for tab width. Modern editors let you set the tab width, so if your team decides and everyone sets their editor to that tab width, everyone will be on the same page.
Think if a line is 100 characters long for you and someone comes by and says it's over because their editor is set to 8 while yours is 2. You could tell them they have the wrong tab width and be making just as valid a point as they could if they told you your tab width was wrong if there's no agreed upon standard. If there's not, the 8-width guy will be wrong when he writes a 100 character line and another developer who prefers 10-width tabs complains.
Where tabs don't inherently have a defined width, it's up to the team to either choose one or choose not to engage in irrational arguments about it.
That said, spaces are shown to be correlated with higher income, so, y'know... ;)

Transferring lots of objects with Guid IDs to the client

I have a web app that uses Guids as the PK in the DB for an Employee object and an Association object.
One page in my app returns a large amount of data showing all Associations all Employees may be a part of.
So right now, I am sending to the client essentially a bunch of objects that look like:
{assocation_id: guid, employees: [guid1, guid2, ..., guidN]}
It turns out that many employees belong to many associations, so I am sending down the same Guids for those employees over and over again in these different objects. For example, it is possible that I am sending down 30,000 total guids across all associations in some cases, of which there are only 500 unique employees.
I am wondering if it is worth me building some kind of lookup index that I also send to the client like
{ 1: Guid1, 2: Guid2 ... }
and replacing all of the Guids in the objects I send down with those ints,
or if simply gzipping the response will compress it enough that this extra effort is not worth it?
Note: please don't get caught up in the details of if I should be sending down 30,000 pieces of data or not -- this is not my choice and there is nothing I can do about it (and I also can't change Guids to ints or longs in the DB).
Your wrote at the end of your question the following
Note: please don't get caught up in the details of if I should be
sending down 30,000 pieces of data or not -- this is not my choice and
there is nothing I can do about it (and I also can't change Guids to
ints or longs in the DB).
I think it's your main problem. If you don't solve the main problem you will be able to reduce the size of transferred data to 10 times for example, but you still don't solve the main problem. Let us we think about the question: Why so many data should be sent to the client (to the web browser)?
The data on the client side are needed to display some information to the user. The monitor is not so large to show 30,000 total on one page. No user are able to grasp so much information. So I am sure that you display only small part of the information. In the case you should send only the small part of information which you display.
You don't describe how the guids will be used on the client side. If you need the information during row editing for example. You can transfer the data only when the user start editing. In the case you need transfer the data only for one association.
If you need display the guids directly, then you can't display all the information at once. So you can send the information for one page only. If the user start to scroll or start "next page" button you can send the next portion of data. In the way you can really dramatically reduce the size of transferred data.
If you do have no possibility to redesign the part of application you can implement your original suggestion: by replacing of GUID "{7EDBB957-5255-4b83-A4C4-0DF664905735}" or "7EDBB95752554b83A4C40DF664905735" to the number like 123 you reduce the size of GUID from 34 characters to 3. If you will send additionally array of "guid mapping" elements like
123:"7EDBB95752554b83A4C40DF664905735",
you can reduce the original size of data 30000*34 = 1020000 (1 MB) to 300*39 + 30000*3 = 11700+90000 = 101700 (100 KB). So you can reduce the size of data in 10 times. The usage of compression of dynamic data on the web server can reduce the size of data additionally.
In any way you should examine why your page is so slowly. If the program works in LAN, then the transferring of even 1MB of data can be quick enough. Probably the page is slowly during placing of the data on the web page. I mean the following. If you modify some element on the page the position of all existing elements have to be recalculated. If you would be work with disconnected DOM objects first and then place the whole portion of data on the page you can improve the performance dramatically. You don't posted in the question which technology you use in you web application so I don't include any examples. If you use jQuery for example I could give some example which clear more what I mean.
The lookup index you propose is nothing else than a "custom" compression scheme. As amdmax stated, this will increase your performance if you have a lot of the same GUIDs, but so will gzip.
IMHO, the extra effort of writing the custom coding will not be worth it.
Oleg states correctly, that it might be worth fetching the data only when the user needs it. But this of course depends on your specific requirements.
if simply gzipping the response will compress it enough that this extra effort is not worth it?
The answer is: Yes, it will.
Compressing the data will remove redundant parts as good as possible (depending on the algorithm) until decompression.
To get sure, just send/generate the data uncompressed and compressed and compare the results. You can count the duplicate GUIDs to calculate how big your data block would be with the dictionary compression method. But I guess gzip will be better because it can also compress the syntactic elements like braces, colons, etc. inside your data object.
So what you are trying to accomplish is Dictionary compression, right?
http://en.wikibooks.org/wiki/Data_Compression/Dictionary_compression
What you will get instead of Guids which are 16 bytes long is int which is 4 bytes long. And you will get a dictionary full of key value pairs that will associate each guid to some int value, right?
It will decrease your transfer time when there're many objects with the same id used. But will spend CPU time before transfer to compress and after transfer to decompress. So what is the amount of data you transfer? Is it mb / gb / tb? And is there any good reason to compress it before sending?
I do not know how dynamic is your data, but I would
on a first call send two directories/dictionaries mapping short ids to long GUIDS, one for your associations and on for your employees e.g. {1: AssoGUID1, 2: AssoGUID2,...} and {1: EmpGUID1, 2:EmpGUID2,...}. These directories may also contain additional information on the Associations and Employees instances; I suspect you do not simply display GUIDs
on subsequent calls just send the index of Employees per Association { 1: [2,4,5], 3:[2,4], ...}, the key being the association short id and the ids in the array value, the short ids of the employees. Given your description building the reverse index: Employee to Associations may give better result size wise (but higher processing)
Then its all down to associative arrays manipulations which is straightforward in JS.
Again, if your data is (very) dynamic server side, the two directories will soon be obsolete and maintaining synchronization may cost you a lot.
I would start by answering the following questions:
What are the performance requirements? Are there size requirements? Speed requirements? What is the minimum performance that is truly needed?
What are the current performance metrics? How far are you from the requirements?
You characterized the data as possibly being mostly repeats. Is that the normal case? If not, what is?
The 2 options you listed above sound reasonable and trivial to implement. Try creating a look-up table and see what performance gains you get on actual queries. Try zipping the results (with look-ups and without), and see what gains you get.
In my experience if you're not TOO far from the goal, performance requirements are often trial and error.
If those options don't get you close to the requirements, I would take a step back and see if the requirements are reasonable in the time you have to solve the problem.
What you do next depends on which performance goals are lacking. If it is size, you're starting to be limited if you're required to send the entire association list ever time. Is that truly a requirement? Can you send the entire list once, and then just updates?

Performance: Need to read from LONGTEXT

I'm building a CMS-type webapp that allows users to enter arbitrary-sized blocks of HTML. These blocks are entered by the user in their admin area and inserted into their template of choice when a page is delivered.
I'm guessing a user is not going to add more than 50-100 blocks and I'm not going to be getting more than 1000 users any time soon.
I was planning on using mySQL's LONGTEXT type to store these but I'm wondering if storing files in a directory will be more performant as the Linux OS will cache them? Given that I'm building for at most (1000 * 100) text blocks is there any reasonable performance worry with using mySQL?
Obviously I will be caching the HTML before delivery so I won't be reading these blocks on every delivery - reads will only occur when someone updates/creates new content.
I could use memcached/other cache/noSQL implementation or some other storage mechanism but I'm focusing on keeping it simple and delivering ASAP so don't want to introduce other stuff that I don't have experience with unless there's a significant performance worry.
Are the blocks of HTML content the only thing you are saving? If so, a file may be easiest.
However, it seems likely that you may want to save other bits of information along with the HTML and be able to query based on those bits of data. For example: date created, date last modified, name of the block, the user(s) who have edited the block.
If this is the case, then a database may be the best way to go. Since you said you do not expect to have many users (at least not a first) I would concentrate on finding the solution that is the fastest / most flexible to program and focus on performance and caching after your website begins to grow in size.
I advise you to use a flat file rather than Mysql to store this kind of data.
Html is more a "file" than a "value information" so it hasn't to be in a DB.
Moreover, you will certainly have better performances.
You can also read this post.

Text difference patch

Am trying to write a piece of code which will allows the user to type text into a textbox which then gets saved on the server. When the user types some more text in the textbox, I want only the difference to be sent to the server.
Is there a difference algorithm for JS which I can use to send only information about the difference. So it should be able to tell the difference between two text boxes essentially.
It could also be language agnostic and I can port it.
Thank you for your time.
UPDATE
In simple words. I have a text area which keeps saving the text in the box every X seconds. Now to save bandwidth I only want it to send the difference from the last saved revision (which I can say put in a variable. Initially this will be empty). Now the JS has to check the difference between the last revision and the current state of the textbox and generate a change list to send to the server.
UPDATE 2
Something like www.etherpad.com
Google DiffMatchPatch has a Javascript implementation, I've used it with much success.
http://code.google.com/p/google-diff-match-patch/
The Python difflib module does this and more. It's very flexible but might be challenging to port to Javascript.
Regarding your update, I'm first wondering why you need to worry about bandwidth. Unless your users are typing a lot of text into an edit box (which has its own usability issues) then there just aren't that many bytes to send. Send the whole text box each time you autosave. Users can't type fast enough to really notice the use of bandwidth.
Or, you could meet halfway. Every time you autosave, check to see whether the user has only added new text to the end compared to the last time. If so, send an "append" type update with just the new text. If the user has gone back and edited anything else, then send a "replace" type update where you send the whole text. This takes care of the common append-only case without severely complicating your implementation.
Instead of calculating a diff between 2 texts, which is difficult,
you could always, while people are editting, record the keystrokes and the caret position in the textbox. If you send this over every now and then (and clean the buffer), the server can playback the exact same sequence.
This code-smells of premature optimization. Perhaps you should implement your solution first and then see about optimizing your transfer rates using diffs. How much text are you looking at? Because the request and response packets are going to be more or less the same size with only a few bytes difference for your message, so the savings could be very minimal.
At the very least, complete your solution without optimization and profile your network traffic using tools like Firebug and then test to see how much worse the performance is with what you would consider to be the maximum text block that could be sent.
Finally, you could always use the TypeWatch JQuery plugin to listen for change events in the textbox. You can set a delay so that once the user finishes typing and the delay elapses, the callback function is triggered. This means that the text will only be sent when the user types something, and only when they are finished typing. This will be significantly more efficient than repeatedly polling the server.
Depends on how far you are ready to go.
You would like to check deltav algorithm, it is used by svn in particular: http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff

How does differential execution work?

I've seen a few mentions of this on Stack Overflow, but staring at Wikipedia (the relevant page has since been deleted) and at an MFC dynamic dialog demo did nothing to enlighten me. Can someone please explain this? Learning a fundamentally different concept sounds nice.
Based on the answers: I think I'm getting a better feel for it. I guess I just didn't look at the source code carefully enough the first time. I have mixed feelings about differential execution at this point. On the one hand, it can make certain tasks considerably easier. On the other hand, getting it up and running (that is, setting it up in your language of choice) is not easy (I'm sure it would be if I understood it better)...though I guess the toolbox for it need only be made once, then expanded as necessary. I think in order to really understand it, I'll probably need to try implementing it in another language.
Gee, Brian, I wish I had seen your question sooner. Since it's pretty much my
"invention" (for better or worse), I might be able to help.
Inserted: The shortest possible
explanation I can make is that if
normal execution is like throwing a
ball in the air and catching it, then
differential execution is like
juggling.
#windfinder's explanation is different from mine, and that's OK. This technique is not easy to wrap one's head around, and it's taken me some 20 years (off and on) to find explanations that work. Let me give it another shot here:
What is it?
We all understand the simple idea of a computer stepping along through a program, taking conditional branches based on the input data, and doing things. (Assume we are dealing only with simple structured goto-less, return-less code.) That code contains sequences of statements, basic structured conditionals, simple loops, and subroutine calls. (Forget about functions returning values for now.)
Now imagine two computers executing that same code in lock-step with each other, and able to compare notes. Computer 1 runs with input data A, and Computer 2 runs with input data B. They run step-by-step side by side. If they come to a conditional statement like IF(test) .... ENDIF, and if they have a difference of opinion on whether the test is true, then the one who says the test if false skips to the ENDIF and waits around for its sister to catch up. (This is why the code is structured, so we know the sister will eventually get to the ENDIF.)
Since the two computers can talk to each other, they can compare notes and give a detailed explanation of how the two sets of input data, and execution histories, are different.
Of course, in differential execution (DE) it is done with one computer, simulating two.
NOW, suppose you only have one set of input data, but you want to see how it has changed from time 1 to time 2. Suppose the program you're executing is a serializer/deserializer. As you execute, you both serialize (write out) the current data and deserialize (read in) the past data (which was written the last time you did this). Now you can easily see what the differences are between what the data was last time, and what it is this time.
The file you are writing to, and the old file you are reading from, taken together constitute a queue or FIFO (first-in-first-out), but that's not a very deep concept.
What is it good for?
It occurred to me while I was working on a graphics project, where the user could construct little display-processor routines called "symbols" that could be assembled into larger routines to paint things like diagrams of pipes, tanks, valves, stuff like that. We wanted to have the diagrams be "dynamic" in the sense that they could incrementally update themselves without having to redraw the entire diagram. (The hardware was slow by today's standards.) I realized that (for example) a routine to draw a bar of a bar-chart could remember its old height and just incrementally update itself.
This sounds like OOP, doesn't it? However, rather than "make" an "object", I could take advantage of the predictability of the execution sequence of the diagram procedure. I could write the bar's height in a sequential byte-stream. Then to update the image, I could just run the procedure in a mode where it sequentially reads its old parameters while it writes the new parameters so as to be ready for the next update pass.
This seems stupidly obvious and would seem to break as soon as the procedure contains a conditional, because then the new stream and the old stream would get out of sync. But then it dawned on me that if they also serialized the boolean value of the conditional test, they could get back in sync.
It took a while to convince myself, and then to prove, that this would always work, provided a simple rule (the "erase mode rule") is followed.
The net result is that the user could design these "dynamic symbols" and assemble them into larger diagrams, without ever having to worry about how they would dynamically update, no matter how complex or structurally variable the display would be.
In those days, I did have to worry about interference between visual objects, so that erasing one would not damage others. However, now I use the technique with Windows controls, and I let Windows take care of rendering issues.
So what does it achieve? It means I can build a dialog by writing a procedure to paint the controls, and I do not have to worry about actually remembering the control objects or dealing with incrementally updating them, or making them appear/disappear/move as conditions warrant. The result is much smaller and simpler dialog source code, by about an order of magnitude, and things like dynamic layout or altering the number of controls or having arrays or grids of controls are trivial. In addition, a control such as an Edit field can be trivially bound to the application data it is editing, and it will always be provably correct, and I never have to deal with its events. Putting in an edit field for an application string variable is a one-line edit.
Why is it hard to understand?
What I have found hardest to explain is that it requires thinking differently about software. Programmers are so firmly wedded to the object-action view of software that they want to know what are the objects, what are the classes, how do they "build" the display, and how do they handle the events, that it takes a cherry bomb to blast them out of it. What I try to convey is that what really matters is what do you need to say? Imagine you are building a domain-specific language (DSL) where all you need to do is tell it "I want to edit variable A here, variable B there, and variable C down there" and it would magically take care of it for you. For example, in Win32 there is this "resource language" for defining dialogs. It is a perfectly good DSL, except it doesn't go far enough. It doesn't "live in" the main procedural language, or handle events for you, or contain loops/conditionals/subroutines. But it means well, and Dynamic Dialogs tries to finish the job.
So, the different mode of thinking is: to write a program, you first find (or invent) an appropriate DSL, and code as much of your program in that as possible. Let it deal with all the objects and actions that only exist for implementation's sake.
If you want to really understand differential execution and use it, there are a couple of tricky issues that can trip you up. I once coded it in Lisp macros, where these tricky bits could be handled for you, but in "normal" languages it requires some programmer discipline to avoid the pitfalls.
Sorry to be so long-winded. If I haven't made sense, I'd appreciate it if you'd point it out and I can try and fix it.
Added:
In Java Swing, there is an example program called TextInputDemo. It is a static dialog, taking 270 lines (not counting the list of 50 states). In Dynamic Dialogs (in MFC) it is about 60 lines:
#define NSTATE (sizeof(states)/sizeof(states[0]))
CString sStreet;
CString sCity;
int iState;
CString sZip;
CString sWholeAddress;
void SetAddress(){
CString sTemp = states[iState];
int len = sTemp.GetLength();
sWholeAddress.Format("%s\r\n%s %s %s", sStreet, sCity, sTemp.Mid(len-3, 2), sZip);
}
void ClearAddress(){
sWholeAddress = sStreet = sCity = sZip = "";
}
void CDDDemoDlg::deContentsTextInputDemo(){
int gy0 = P(gy);
P(www = Width()*2/3);
deStartHorizontal();
deStatic(100, 20, "Street Address:");
deEdit(www - 100, 20, &sStreet);
deEndHorizontal(20);
deStartHorizontal();
deStatic(100, 20, "City:");
deEdit(www - 100, 20, &sCity);
deEndHorizontal(20);
deStartHorizontal();
deStatic(100, 20, "State:");
deStatic(www - 100 - 20 - 20, 20, states[iState]);
if (deButton(20, 20, "<")){
iState = (iState+NSTATE - 1) % NSTATE;
DD_THROW;
}
if (deButton(20, 20, ">")){
iState = (iState+NSTATE + 1) % NSTATE;
DD_THROW;
}
deEndHorizontal(20);
deStartHorizontal();
deStatic(100, 20, "Zip:");
deEdit(www - 100, 20, &sZip);
deEndHorizontal(20);
deStartHorizontal();
P(gx += 100);
if (deButton((www-100)/2, 20, "Set Address")){
SetAddress();
DD_THROW;
}
if (deButton((www-100)/2, 20, "Clear Address")){
ClearAddress();
DD_THROW;
}
deEndHorizontal(20);
P((gx = www, gy = gy0));
deStatic(P(Width() - gx), 20*5, (sWholeAddress != "" ? sWholeAddress : "No address set."));
}
Added:
Here's example code to edit an array of hospital patients in about 40 lines of code. Lines 1-6 define the "database". Lines 10-23 define the overall contents of the UI. Lines 30-48 define the controls for editing a single patient's record. Note the form of the program takes almost no notice of events in time, as if all it had to do was create the display once. Then, if subjects are added or removed or other structural changes take place, it is simply re-executed, as if it were being re-created from scratch, except that DE causes incremental update to take place instead. The advantage is that you the programmer do not have to give any attention or write any code to make the incremental updates of the UI happen, and they are guaranteed correct. It might seem that this re-execution would be a performance problem, but it is not, since updating controls that do not need to be changed takes on the order of tens of nanoseconds.
1 class Patient {public:
2 String name;
3 double age;
4 bool smoker; // smoker only relevant if age >= 50
5 };
6 vector< Patient* > patients;
10 void deContents(){ int i;
11 // First, have a label
12 deLabel(200, 20, “Patient name, age, smoker:”);
13 // For each patient, have a row of controls
14 FOR(i=0, i<patients.Count(), i++)
15 deEditOnePatient( P( patients[i] ) );
16 END
17 // Have a button to add a patient
18 if (deButton(50, 20, “Add”)){
19 // When the button is clicked add the patient
20 patients.Add(new Patient);
21 DD_THROW;
22 }
23 }
30 void deEditOnePatient(Patient* p){
31 // Determine field widths
32 int w = (Width()-50)/3;
33 // Controls are laid out horizontally
34 deStartHorizontal();
35 // Have a button to remove this patient
36 if (deButton(50, 20, “Remove”)){
37 patients.Remove(p);
37 DD_THROW;
39 }
40 // Edit fields for name and age
41 deEdit(w, 20, P(&p->name));
42 deEdit(w, 20, P(&p->age));
43 // If age >= 50 have a checkbox for smoker boolean
44 IF(p->age >= 50)
45 deCheckBox(w, 20, “Smoker?”, P(&p->smoker));
46 END
47 deEndHorizontal(20);
48 }
Added: Brian asked a good question, and I thought the answer belonged in the main text here:
#Mike: I'm not clear on what the "if (deButton(50, 20, “Add”)){" statement is actually doing. What does the deButton function do? Also, are your FOR/END loops using some sort of macro or something? – Brian.
#Brian: Yes, the FOR/END and IF statements are macros. The SourceForge project has a complete implementation. deButton maintains a button control. When any user input action takes place, the code is run in "control event" mode, in which deButton detects that it was pressed and signifies that it was pressed by returning TRUE. Thus, the "if(deButton(...)){... action code ...} is a way of attaching action code to the button, without having to create a closure or write an event handler. The DD_THROW is a way of terminating the pass when the action is taken because the action may have modified application data, so it is invalid to continue the "control event" pass through the routine. If you compare this to writing event handlers, it saves you writing those, and it lets you have any number of controls.
Added: Sorry, I should explain what I mean by the word "maintains". When the procedure is first executed (in SHOW mode), deButton creates a button control and remembers its id in the FIFO. On subsequent passes (in UPDATE mode), deButton gets the id from the FIFO, modifies it if necessary, and puts it back in the FIFO. In ERASE mode, it reads it from the FIFO, destroys it, and does not put it back, thereby "garbage collecting" it. So the deButton call manages the entire lifetime of the control, keeping it in agreement with application data, which is why I say it "maintains" it.
The fourth mode is EVENT (or CONTROL). When the user types a character or clicks a button, that event is caught and recorded, and then the deContents procedure is executed in EVENT mode. deButton gets the id of its button control from the FIFO and askes if this is the control that was clicked. If it was, it returns TRUE so the action code can be executed. If not, it just returns FALSE. On the other hand, deEdit(..., &myStringVar) detects if the event was meant for it, and if so passes it to the edit control, and then copies the contents of the edit control to myStringVar. Between this and normal UPDATE processing, myStringVar always equals the contents of the edit control. That is how "binding" is done. The same idea applies to scroll bars, list boxes, combo boxes, any kind of control that lets you edit application data.
Here's a link to my Wikipedia edit: http://en.wikipedia.org/wiki/User:MikeDunlavey/Difex_Article
Differential execution is a strategy for changing the flow of your code based on external events. This is usually done by manipulating a data structure of some kind to chronicle the changes. This is mostly used in graphical user interfaces, but is also used for things like serialization, where you are merging changes into an existing "state."
The basic flow is as follows:
Start loop:
for each element in the datastructure:
if element has changed from oldDatastructure:
copy element from datastructure to oldDatastructure
execute corresponding subroutine (display the new button in your GUI, for example)
End loop:
Allow the states of the datastructure to change (such as having the user do some input in the GUI)
The advantages of this are a few. One, it is separation
of the execution of your changes, and the actual
manipulation of the supporting data. Which is nice for
multiple processors. Two, it provides a low bandwidth method
of communicating changes in your program.
Think of how a monitor works:
It is updated at 60 Hz -- 60 times a second. Flicker flicker flicker 60 times, but your eyes are slow and can't really tell. The monitor shows whatever is in the output buffer; it just drags this data out every 1/60th of a second no matter what you do.
Now why would you want your program to update the whole buffer 60 times a second if the image shouldn't change that often? What if you only change one pixel of the image, should you rewrite the entire buffer?
This is an abstraction of the basic idea: you want to change the output buffer based on what information you want displayed on the screen. You want to save as much CPU time and buffer write time as possible, so you don't edit parts of the buffer that need not be changed for the next screen pull.
The monitor is separate from your computer and logic (programs). It reads from the output buffer at whatever rate it updates the screen. We want our computer to stop synchronizing and redrawing unnecessarily. We can solve this by changing how we work with the buffer, which can be done in a variety of ways. His technique implements a FIFO queue that is on delay -- it holds what we just sent to the buffer. The delayed FIFO queue does not hold pixel data, it holds "shape primitives" (which might be pixels in your application, but it could also be lines, rectangles, easy-to-draw things because they are just shapes, no unnecessary data is allowed).
So you want to draw/erase things from the screen? No problem. Based on the contents of the FIFO queue I know what the monitor looks like at the moment. I compare my desired output (to erase or draw new primitives) with the FIFO queue and only change values that need to be changed/updated. This is the step which gives it the name Differential Evaluation.
Two distinct ways in which I appreciate this:
The First:
Mike Dunlavey uses a conditional-statement extension. The FIFO queue contains a lot of information (the "previous state" or the current stuff on monitor or time-based polling device). All you have to add to this is the state you want to appear on screen next.
A conditional bit is added to every slot that can hold a primitive in the FIFO queue.
0 means erase
1 means draw
However, we have previous state:
Was 0, now 0: don't do anything;
Was 0, now 1: add it to the buffer (draw it);
Was 1, now 1: don't do anything;
Was 1, now 0: erase it from the buffer (erase it from the screen);
This is elegant, because when you update something you really only need to know what primitives you want to draw to the screen -- this comparison will find out if it should erase a primitive or add/keep it to/in the buffer.
The Second:
This is just one example, and I think that what Mike is really getting at is something that should be fundamental in design for all projects: Reduce the (computational) complexity of design by writing your most computationally intense operations as computerbrain-food or as close as you can get. Respect the natural timing of devices.
A redraw method to draw the entire screen is incredibly costly, and there are other applications where this insight is incredibly valuable.
We are never "moving" objects around the screen. "Moving" is a costly operation if we are going to mimic the physical action of "moving" when we design code for something like a computer monitor. Instead, objects basically just flicker on and off with the monitor. Every time an object moves, it's now a new set of primitives and the old set of primitives flickers off.
Every time the monitor pulls from the buffer we have entries that look like
Draw bit primitive_description
0 Rect(0,0,5,5);
1 Circ(0,0,2);
1 Line(0,1,2,5);
Never does an object interact with the screen (or time-sensitive polling device). We can handle it more intelligently than an object will when it greedily asks to update the whole screen just to show a change specific to only itself.
Say we have a list of all possible graphical primitives our program is capable of generating, and that we tie each primitive to a set of conditional statements
if (iWantGreenCircle && iWantBigCircle && iWantOutlineOnMyCircle) ...
Of course, this is an abstraction and, really, the set of conditionals that represents a particular primitive being on/off could be large (perhaps hundreds of flags that must all evaluate to true).
If we run the program, we can draw to the screen at essentially the same rate at which we can evaluate all these conditionals. (Worst case: how long it takes to evaluate the largest set of conditional statements.)
Now, for any state in the program, we can simply evaluate all the conditionals and output to the screen lightning-quick! (We know our shape primitives and their dependent if-statements.)
This would be like buying a graphically-intense game. Only instead of installing it to your HDD and running it through your processor, you buy a brand-new board that holds the entirety of the game and takes as input: mouse, keyboard, and takes as output: monitor. Incredibly condensed conditional evaluation (as the most fundamental form of a conditional is logic gates on circuit boards). This would, naturally, be very responsive, but it offers almost no support in fixing bugs, as the whole board design changes when you make a tiny design change (because the "design" is so far-removed from the nature of the circuit board). At the expense of flexibility and clarity in how we represent data internally we have gained significant "responsiveness" because we are no longer doing "thinking" in the computer; it is all just reflex for the circuit board based on the inputs.
The lesson, as I understand it, is to divide labor such that you give each part of the system (not necessarily just computer and monitor) something it can do well. The "computer thinking" can be done in terms of concepts like objects... The computer brain will gladly try and think this all through for you, but you can simplify the task a great deal if you are able to let the computer think in terms of data_update and conditional_evals. Our human abstractions of concepts into code are idealistic, and in the case of internal program draw methods a little overly idealistic. When all you want is a result (array of pixels with correct color values) and you have a machine that can easily spit out an array that big every 1/60th of a second, try and eliminate as much flowery thinking from the computer brain as possible so that you can focus on what you really want: to synchronize your graphical updates with your (fast) inputs and the natural behavior of the monitor.
How does this map to other applications?
I'd like to hear of other examples, but I'm sure there are many. I think anything that provides a real-time "window" into the state of your information (variable state or something like a database... a monitor is just a window into your display buffer) can benefit from these insights.
I find this concept very similar to the state machines of classic digital electronics. Specially the ones which remember their previous output.
A machine whose next output depends on current input and previous output according to (YOUR CODE HERE). This current input is nothing but previous output + (USER, INTERACT HERE).
Fill up a surface with such machines, and it will be user interactive and at the same time represent a layer of changeable data. But at this stage it will still be dumb, just reflecting user interaction to underlying data.
Next, interconnect the machines on your surface, let them share notes, according to (YOUR CODE HERE), and now we make it intelligent. It will become an interactive computing system.
So you just have to provide your logic at two places in the above model; the rest is taken care of by the machine design itself. That's what good about it.

Resources