All you Stackoverflowers,
I was wondering why GUI code is responsible for sucking away many, many cpu cycles. In principle, the graphical rendering is far less complex than Doom (although most corporate GUIs will introduce lots of window dressing). The event handling layer is also seemingly a heavy cost, however, it seems that a well-written implementation should switch between contexts efficiently on modern processors with a lot of memory/cache.
If anybody has run a profiler on their big GUI application, or a common API itself, I'm interested in where the bottlenecks lie.
Possible explanations (that I imagine) may be:
High levels of abstraction between hardware and application interface
Lots of levels of indirection to the correct code to execute
Low priority (compared to other processes)
Misbehaving applications flooding API with calls
Excessive object orientation?
Complete poor design choices in API (not just issues, but design philosophy)
Some GUI frameworks are much better than others, so I'd like to hear varied perspectives. For example, the Unix/X11 system is much different than Windows and even than WinForms.
Edit: Now a community wiki - go for it. I have one more thing to add -- I'm an algorithms guy in school and would be interested if there are inefficient algorithms in GUI code and which they are. Then again, it's probably just the implementation overhead.
I've no idea generally, but I'd like to add another item to your list - font rendering and calculations. Finding vector glyphs in a font and converting them to bitmap representations with anti-aliasing is no small task. And often it needs to be done twice - first to calculate the width/height of the text for positioning, and then actually drawing the text at the right coordinates.
Also, most drawing code today relies on clipping mechanisms to update just a part of the GUI. So, if just one part needs to be redrawn, the code actually redraws the whole window behind the scenes, and then takes just the needed part to actually update.
Added:
In the comments I found this:
I'm also very interested in this. It can't be that the gui is rendered using only the cpu because if you don't have proper drivers for your gfx-card, desktop graphics render incredibly slow. If you have gfx-drivers however desktop-gfx go kinda fast but never as fast as a directx/opengl app.
Here's the deal as I understand it: every graphic card out there today supports a generic interface for drawing. I'm not sure if it's called "VESA", "SVGA", or if those are just old names from the past. Anyway, this interface involves doing everything through interrupts. For every pixel there is an interrupt call. Or something like that. The proper VGA driver however is able to take advantage of DMA and other enhancements that make the whole process WAY less CPU-intensive.
Added 2: Ah, and for OpenGL/DirectX - that's another feature of today's graphics cards. They are optimized for 3D operations in exclusive mode. That's why the speed. The normal GUI just utilizes basic 2D drawing procedures. So it gets to send the contents of the whole screen every time it wants an update. 3D applications however send a bunch of textures and triangle definitions to the VRAM (video-RAM) and then just reuse them for drawing. They just say something like "take the triangle set #38 with the texture set #25 and draw them". All these things are cached in the VRAM so this is again way faster.
I'm not sure, but I would suspect that the modern 3D-accelerated GUIs (Vista Aero, compiz on Linux, etc.) also might take advantage of this. They could send common bitmaps to the VGA up front and then just reuse them directly from the VRAM. Any application-drawn surfaces however would still need to be sent directly every time for updates.
Added 3: More ideas. :) The modern GUI's for Windows, Linux, etc. are widget-oriented (that's control-oriented for Windows speakers). The problem with this is that each widget has its own drawing code and associated drawing surface (more or less). When the window needs to get redrawn, it calls the drawing code for all its child-widgets, who in turn call the drawing code for their child-widgets, etc.. Every widget redraws its whole surface, even though some of it is obscured by other widgets. With above mentioned clipping techniques some of this drawn information is immediately discarded to reduce flickering and other artifacts. But still it's lots of manual drawing code that includes bitmap blitting, stretching, skewing, drawing lines, text, flood-filling, etc.. And all this gets translated to a series of putpixel calls that get filtered through clipping filters/masks and other stuff. Ah, yes, and alpha blending has also become popular today for nice effects which means even more work. So... yes, you could say this is because of lots of abstraction and indirection. But... could you really do it any better? I don't think so. Only 3D techniques might help, because they take advantage of GPU for alpha-calculations and clipping.
Let's begin by saying that writing libraries is much harder than writing a stand-alone code. The requirement that your abstraction be reusable in as many contexts as possible, including contexts which you haven't though of yet, makes the task challenging even for experienced programmers.
Amongst libraries, writing a GUI toolkit library is a famously difficult problem. This is because the programs which use GUI libraries range over a very wide variety of domains with very different needs. Mr Why and Martin DeMollo discussed the requirements placed of GUI libraries a little while ago.
Writing GUI widgets themselves is difficult because computer users are very sensitive minute details of the behavior of the interface. Non-native widget never feel right, don't they? In order to get non-native widget right -- in order to get any widget right, in fact -- you need to spend an inordinate amount of time tweaking the details of the behavior.
So, GUI are slow because of the inefficiencies introduced by the abstraction mechanisms used to create highly-reusable components, that added to shortness of time available to optimize the code once so much time has been spent just getting the behavior right.
Uhm, that's quite a lot.
The most simple but probably obvious answer is that the programmers behind these GUI apps, are really bad programmers. You can go along way in writing code which does the most bizarre things and it will be faster but few people seem to care how to do this or they deem it to be an expensive non-profitable time wasted effort.
To set things straight off-loading computations to the GPU won't necessarily fix any problems. The GPU is just like the CPU except it's less general purpose and more a data paralleled processor. It can do graphics computations exceptionally well. Whatever graphics API/OS and driver combination you have doesn't really matter that much... well OK, with Vista as an example, they changed the desktop composition engine. This engine is far better composting only that which has changed, and since the number one bottle neck for GUI apps is redrawing is a neat optimization strategy. This idea of virtualizing your computational needs and only update the smallest change every time.
Win32 sends WM_PAINT messages to windows when they need to be redrawn, this can be a result of windows occluding each other. However it's up to the window itself to figure out whats actually changed. More than so nothing did change or the change that was made was trivial enough so that it could have been just preformed on top of what ever top most surface you had.
This kind of graphics handling doesn't necessarily exist today. I would say that people have refrained from writing really efficient and virtualizing rendering solutions because the benefit/cost ration is rather low/high (bad).
Something Windows Presentation Foundation (WPF) does, which I think is far superior to most other GUI API is that it splits layout updates and rendering updates into two separate passes. And while WPF is managed code the rendering engine is not. What happens with rendering is that the managed WPF rendering engine builds a command queue (this is what DirectX and OpenGL does) which is then handed of to the native rendering engine. What's a bit more elegant here is that WPF will then try to retain any computation which didn't change the visual state. A trick if you may, where you avoid costly rendering calls for things that doesn't have to be rendered (virtualizing).
In contrast to WM_PAINT which tells a Win32 window to repaint itself a WPF app would check what parts of that window requires repainting and only repaint the smallest change.
Now WPF is not supreme, it's a solid effort from Microsoft but it's not the holy grail yet... the code which runs the pipeline could still be improved and the memory footprint of any managed app is still more than I would want. But I hope this is the kind of answer you are looking for.
WPF is able to do some things asynchronously rather decent, which is a huge deal if you wanna make a really responsive low-latency/low-cpu UI. Asynchronous operations is more than off-loading work on a different thread.
To summarize things slow and expensive GUI means too much repainting and the kind of repainting which is very expensive i.e. the entire surface area.
I does to some degree depend on the language. You might have noticed that Java and RealBasic applications are a fair bit slower than their C-based (C++, C#, Objective-C) counterparts.
However GUI applications are much more complex than command line apps. The Terminal window needs only to draw a simple window that doesn't support buttons.
There are also multiple loops for extra inputs and features.
I think that you can find some interesting thoughts on this topic in "Window System Design: If I had it to do over again in 2002" by James Gosling (the Java guy, also known for his work on pre-X11 windowing systems). Available online here[pdf].
The article focuses on the positive side (how to make it fast), not on the negative side (what's making it slow), but it is still a good read on the topic.
Related
Allow me to explain, so that this doesn't just get marked as an "opinion-based" question.
I'm learning processing.js right now, and I can't help but notice many of the similarities in functionality with what already exists in the Canvas API of Vanilla-JS. Perhaps writing a set of large-scale animations is much more complicated in plain old Canvas than it is in processing?
I'm asking this because, as I continue to learn more about the vanilla APIs, I'm seeing a lot of new functionality added in JS over the years that is starting to (VERY SLOWLY) make certain aspects of popular frameworks, no longer necessary (jQuery being a great example). I'm curious as to whether or not this is the case with Canvas and processing.js as well.
Personally, I'm trying to determine whether or not I should still be spending a lot of time in processing.js (I'm not asking you to make that decision for me though, but I just want some information that can help me decide what's best for me).
Stackoverflow allows specific non-coding questions about programming tools-like ProcessingJS, but your question seems likely to be closed as too broad.
Even so, here are my thoughts...
Native Canvas versus ProcessingJS
Html5 canvas was born with a rich set of possibilities rivaling Photoshop itself. However, native canvas is a relatively low-level tool where you must handle structuring, eventing, serialization and animation with your own code.
ProcessingJS adds structure, eventing, serialization, animation & many (amazing!) mathematical functions to native canvas. IMHO, ProcessingJS is a higher-level tool that's well worth learning.
Extending native canvas into a higher level tool instead of a low-level tool
With about 500 lines of javascript, you can add a reusable framework to native canvas that adds these features in within a higher level structure: eventing (including drag/drop, scaling, rotating, hit testing, etc), serialization / deserialization.
With about 100 more lines you can add a reusable framework to native canvas that does animation with easing.
Even though native canvas was born with most of the capabilities needed to present even complex content, a PathObject is sorely needed in native canvas. The PathObject would serialize paths to make them reusable. With about 50 lines you can create a reusable PathObject.
Here's a fairly useless recommendation :-p
Try to use the right tool for the job (yeah, not specifically helpful).
Learning native canvas alone will let you do, maybe 70% of pixel display tasks.
Coding the extensions (above) will get you to 90%.
Using a tool like ProcessingJS will get you to 98%.
Yes, there are always about 2% edge cases where you either "can't get there" or must reduce your design requirements to accommodate coding limitations.
A slightly more specific recommendation
Since ProcessingJS merely extends native canvas, IMHO it's well worthwhile to take a few days and learn native canvas. This knowledge will let you determine the right tool for the job.
I may be a bit in over my head here, but if you never try new things - you'll never learn I suppose. I'm working with some multi-touch stuff and have built myself a small but functional GUI library. Up until now I've been used boosts Signals2 library to have my detected gestures be distributed to all active GUI elements (whether on screen or not). I'm a big fan of avoiding pre-mature optimization, so things have been honky-dory until now.
I've used vs2013's profiler to find out that when the user goes touch crazy (the device supports up to 41 simultaneous touches), then my system grinds to a halt, and Signals2 is the culprit. Keep in mind that each touch can trigger a number of Gestures which are all communicated to every GUI element that have registered to interact with this type of Gesture.
Now there are a number of ways to deal with this bottleneck:
Have GUI elements work more cleverly and disconnect them when they're off-screen.
Optimize the signals/slots system so the calls are resolved faster.
Prioritization of events.
I'm not a big fan of ever having to deal with 3 - if avoidable - as it'll directly impact the responsiveness of my application. Nr. 1 should probably be implemented, but I'm more interested in getting to Nr. 2 first.
I don't really need any big fancy stuff. The Signals/Slots system I'd need really only needs to do the core emission stuff along with these 2 feature:
Slots must be able to return a value ending the emission - effectively cancelling any subsequent handling of a signal.
Slots must be order-able - and fairly efficient at this. GUI elements that are interacted with will pop-up above others, so this type of change in order is bound to happen quite often.
I stumbled across this really interesting implementation
https://testbit.eu/2013/cpp11-signal-system-performance/
which seems to have everything except the 'ordering' I need. I've only looked over the code once, and it does seem a little intimidating. If I were to try and add ordering capabilities, I'd prefer not to change too much stuff around if necessary. Does anyone have experience with this stuff? I'm fairly certain that a linked list is not optimal for constant removal and insertion, but then again, it probably needs to be optimized the most for constant emission calls.
Any thoughts are most welcome!
--- Update ---
I've spend a little time adding the features I needed to the code put into the public domain above and pasted the complete (and somewhat hacky version) here:
SimpleSignal with added features
In short, I've added:
Blockable Connections (Implemented via simple IF statement)
Depth/Order parameter (Linear-search insertion)
With those additions, keep in mind it has these current issues:
Blocked connections are simply skipped, not actively removed from the data-structure, so having a lot of blocked connections will affect run-time performance.
Depth is only maintained during insertion. So if you'd like to change the depth you'll have to disconnect and reconnect your slot.
Since the SignalLink interface has become exposed (as a result of my block implementation), it's less safe from a user perspective. It's way easier for you to shoot yourself in the foot with this version by messing with existing references and pointers.
This implementation hasn't been as thoroughly tested as the original I'm sure. I did try out the new functionality a bit. User beware.
I have a feeling there are combinations of Cocoa Quartz Compositions and GPUs which can't be handled by the GPU and which fall back on the software renderer, even if Core Image is "accelerated" normally. How would I detect such a situation?
Or more generally, how do I detect that a machine is too underpowered to handle a certain composition of a certain size, without actually playing the composition and measuring the FPS?
(Measuring the FPS through playing the composition in a hidden window is unlikely to work, since the QCView might detect that situation and optimise away the whole operation, or parts thereof. And even if it didn't do that today it might start doing that with the next update from Apple - it'd be an unreliable solution.)
Update: to be thorough I did write some code to test render the composition at full resolution in an ordered out but properly sized window, trying to force the render to happen with [self startRendering];[self snapshotImage];[self stopRendering];. This took an amount of time which looked reasonable at first, until it turned out the slow machine was faster at running this test than the fast one. ;) In reality the slow machine renders the composition at a measly 2.24 FPS vs 27 FPS on the fast machine.
I'm guessing you're asking so that you can make a simpler fallback animation for weaker systems?
One option may be to check the user's hardware string as is mentioned here:
GPU Chipset Detection.
glGetString can return GL_VENDOR, GL_RENDERER, GL_VERSION, or GL_EXTENSIONS. You could theoretically use GL_VENDOR to identify Intel GMA's as too slow, or compare GL_RENDERER to a list of known poor-performing GPUs. If you're writing code for 10.6+ only, you only have to compare to GPUs used in Intel Macs, so the list shouldn't be too long.
This might not be quite the elegant solution you're looking for, but it should do the trick. I would also provide the user with an override to choose the higher or lower quality graphics if they wish.
This is a follow-up to this question. I'm currently writing a simple game and am looking for the fastest way to (repeatedly) display an array of RGB data in a Win32 window, without flickering or other artifacts.
Several different approaches were recommended in the answers to the previous question, but there was no consensus on which would be the fastest. So, I threw together a test program. The code simply displays a framebuffer on the screen repeatedly, as fast as possible.
These are the results I obtained, for 32-bit data running in a 32-bit video mode - they may surprise some people:
- Direct3D (1): 500 fps
- Direct3D (2): 650 fps
- DirectDraw (3): 1100 fps
- DirectDraw (4): 800 fps
- GDI (SetDIBitsToDevice): 2000 fps
Given these figures:
Why are many people adamant that GDI is simply too slow for this operation?
Is there any reason to prefer DirectDraw or Direct3D over SetDIBitsToDevice?
Here is a brief summary of the calls made by each of the Direct* codepaths. If anyone knows a more efficient way to use DirectDraw/Direct3D, please comment.
1. CreateTexture(D3DUSAGE_DYNAMIC, D3DPOOL_DEFAULT);
LockRect(); memcpy(); UnlockRect(); DrawPrimitive()
2. CreateTexture(0, D3DPOOL_SYSTEMMEM); CreateTexture(0, D3DPOOL_DEFAULT);
LockRect(); memcpy(); UnlockRect(); UpdateTexture(); DrawPrimitive()
3. CreateSurface(); SetSurfaceDesc(lpSurface = &frameBuffer[0]);
memcpy(); primarySurface->Blt();
4. CreateSurface();
Lock(); memcpy(); Unlock(); primarySurface->Blt();
There are a couple of things to keep in mind here. First of all, a lot of "common knowledge" is based on some facts that no longer really apply.
In the days of AGP, when the CPU talked directly to the GPU, it always used the base PCI protocol, which happened at the "1x" rate (always and inevitably). AGX 2x/4x/8x only applied when the GPU was taking to the memory controller directly. In other words, depending on when you looked, it was up to 8 times as fast to have the GPU load a texture from memory as it was for the CPU to send the same data directly to the GPU. Of course, the CPU also had a great deal more bandwidth to memory than the PCI bus supported.
When things switched to PCI-E, however, that changed completely. While there can be differences in bandwidth depending on path, there's no general rule that memory->GPU will be faster than CPU->GPU. The one generalization that's (mostly) safe is that if you have a dedicated graphics card, then the GPU will almost always have more bandwidth to the memory on the graphics card than it does to main memory on the motherboard.
In your case, that doesn't matter much though -- you're talking about moving data from CPU space to GPU space regardless. The main speed difference with using DirectX (or OpenGL) happens when you keep all (or most) of the computation on the GPU, and avoid using the CPU (or main memory) at all. They don't (now that AGP is history) provide any substantial improvement in memory->display bandwidth.
Jerry Coffin makes some good points. The thing to bear in mind is what the DI stands for in SetDIBitsToDevice. It stands for Device Independent. Which means you were ALWAYS at the mercy of drivers. Some drivers used to be complete rubbish and it affected the performance massively. DirectDraw suffered from similar issues as well ... but you also had access to the hardware blitters so it was generally more useful. IHVs also tended to put more time in to writing proper drivers for DirectDraw because of its gaming association. Who wants to be the bottom of the performance pile when the hardware is quite capable of doing better?
These days many graphics cards can accept the bit data directly so no conversion happens. If it does need to be swizzled this is also INCREDIBLY quick in this day and age.
The reason your Direct3D performance is so terrible, by comparison, is that Direct3D, by nature of the fact it is meant to be used totally internally to the GPU, uses odd and complex formats to improve cache performance and so forth.
Couple that with the fact that you aren't testing like for like (with DDraw and D3D) by creating a texture/surface, locking it, copying, unlocking and then drawing over the back buffer (via various methods). To get best performance you'd be best off directly locking the backbuffer using a DISCARD lock then memcpy'ing directly into the returned buffer before unlocking. This will bring your performance much closer to the SetDIBitsToDevice. I still would expect D3D to be slower than DDraw, however, for the reasons outlined above.
The reason you will hear people trounce on GDI is that it used to just be old windows API calls. The newer versions of it (that were called GDI+ when I last looked at em) are actually just an API placed on top of DirectX calls. So using GDI may seem fairly simple programming wise at times, but adding a layer between things always slows things down. As mentioned in the response from Jerry Coffin, your examples are about moving the data, and that is the slow time. I am a bit surprised that DirectX is that much slower though but I can not be much more help with out digging through the DirectX documentation (which has been pretty awesome for quite some time really.. Might want to check out www.codesampler.com. I have always found good starting places from him and actually, while I may be insane for saying this, I would swear the improvements to the DirectX SDK in doc and examples were done based on this guys work!)
As for the DirectDraw vs Direct3D (and not the GDI calls) discussion. I would say go to Direct3D. I believe DirectDraw has been deprecated since 8.0 or so, and 9.0 has been around for quite a long while. And at the end of the day all of DirectX is 3D, it just varies on the levels of helpful 2D apis that are around, but you may find you can do some very interesting things in a 2D environment when you are actually using 3D space. (I had a pretty neat randomly generated lightning weapon for a space invaders clone at one time :))
Anywho, hope this helped!
PS: It should be noted that DirectX is not always the fastest. For keyboard input (unless this has changed in 10 or 11) it has pretty much always been recommended to use the windows events.. as DirectInput was actually just a wrapper for that system!.. XInput however is -awesome-!!
I have been asked to investigate porting Wii games and some (Sony) PSOne games to OpenGL ES (can you guess what platform?).
I have never undertaken a game port like this before (and will be hiring someone to do it) but I'd like to understand the process.
Does the Wii use OpenGL? If not what does it use and how easy is it to port to OpenGL / OpenGL ES?
Are there any resources/books/blogs that will help me in understanding the process?
Will my company have to become an official Wii developer? If so where do I start that process?
Porting from the Wii or the PSOne is a complex and involved task that can be broken down into multiple separate engineering efforts working in parallel to produce a working end product. The best possible thing you can do before moving to the target hardware is to compartmentalize all of the non-portable code while ensuring that the game continues to run as expected. When you commit to moving to the new platform, your effort switches to reimplementing the non-portable compartmentalized parts.
So, to answer your question, yes, you will need to become or work with a Sony and Nintendo licensed developer in order to take this approach. In the case of Sony, I don't even know if they offer a PSOne development program anymore which presents issues. Your Sony account rep can help clarify.
The major subsystems that are likely to be the focus of your porting effort are:
Rendering Graphics code contains fundamental assumptions about the hardware it is being run on in order to perform optimally. API-level compatibility is superficial compatibility and does not get you as much as you may hope it does. Plan on finding the entry point to the renderer and determining what data you need to render a scene and rewriting all the render code from there for your target hardware.
Game Saving Game state serialization and archival will need to be separated out. Older games often fwrite() structs with #pragma packed fields. Is that still going to work for you?
Networking Wii games write to high level services that are unavailable on your target hardware. At the low level, sockets are still sockets. What network services do your Wii games rely on?
Controls From where you are coming from to where you are going, anything short of a full redesign or reimagining of input will result in poor reviews of the software.
Memory Management Console games often make fundamental assumptions about the rate the system software returns memory from the heap, how much fragmentation it will cause and the duration the game needs to operate under these conditions. These memory management assumptions are obsolete on the new platform. It is wise to write your own memory manager that provides a cushion from the operating system. Also, console games compiled for release are stripped of most error handling and don't gracefully handle running out of memory-- just a heads up.
Content Your bottleneck will be system memory. Can you fit the necessary assets into memory? With textures, you can reduce mip where necessary and with graphics hardware timing, you can pull in the far clipping plane. With assets resident in memory, you may need a technical artist to go through and reduce the face density of your models or an animation programmer to implement a more size-friendly animation codec. This is very game specific.
You also run into the standard set of problems with things like bit compatibility (though the Wii and PSOne are both 32-bit), compiler idiosyncrasies, build script incompatibilities and proprietary compiler extensions.
Games are relatively challenging to test. A good rule of thumb is you want to have enough testers on your team to run through the game in a maximum of two days, covering all major aspects of play. In games that take a long time to beat (RPGs with 30+ hours of gameplay), your testing team needs to be quite large to offer full coverage. Because you are just doing a port, you can come up with a testing plan that maximizes coverage of your new code without having a testing team punch every wall in your game to make sure it (still) has clipping. The game shipped once.
Becoming a licensed developer requires you to apply. The turnaround time, from experience, is not good. Generally speaking, priority is given to studios with shipped titles and organized offices with reasonably good security and the ability to buy the (relatively) expensive development kits. You may be better off working with a licensed developer if you do not meet these criteria.
Console and game development is challenging for people already experienced in it. There is no book that covers it all. My recommendation is to attempt to recruit an expert who has experience shipping titles in a position of systems or engine programmer. What types of programmers and skillsets exist in games is a whole different question for Stack, though.
Games consoles don't use OpenGL but their own, custom libraries. The main reason is that they are pretty slow and have little RAM. So you need to squeeze out every drop of performance you can get. And that means: Custom code. Usually, you get a framework with the developer kit which gets you started and then, you build your code from that. Eventually, you'll start replacing parts from the developer kit with your own special code to get all the speed and special effects you need.
There is a reason why PSOne games are so ugly on the PS3 despite the fact that the developers have access to the sources: Revenue just doesn't justify to touch the code.
Which is one reason why game development is so expensive: Every game is (more or less) a completely new product. Sometimes, game companies can reuse a bit of code from the last version but more often than not, they have to develop everything again. They also don't talk much with each other.
In recent years, kits have become more complex and powerful and you can get complete game engines (with all kinds of effects and 3D support) but each engine is a completely different kind of beast, so you can't even copy code from engine A to B.
Today, media content (video, audio and render sequences) are so expensive that the actual game engine is often a minor detail, so this isn't going to change any time soon.
Net result: If you want to port a game, write an emulator for the hardware (which is usually pretty simple and allows you to run all kinds of games).
[EDIT] To develop software for the Wii, see here: http://www.warioworld.com/
For a Wii emulator, see http://wiiemulator.net/
I ported a couple of games, when I was a new game programmer, from working with one version of our engine to a newer version (where backwards compatibility was neither ignored nor pursued). Even copying (and possibly renaming) the files and placing them in a home in the new project was a bit of work. Following that, the procedure was:
recompile
fix many of the hundreds of errors [in many places, with the same error occurring over and over again]
and
"wire up" calls from the new game engine to the appropriate calls in the old code
"wire up" function calls from the old code into the new game engine
deal with other oddities (ex. in the old game engine, the 2d game would "swizzle" textures itself; in the new version, the engine did it (on specific platforms))
and, while I don't recall this clearly, it was probably mixed in with a bunch of #ifdeffing out portions of code so the thing would actually compile, and possibly creating function stubs to be filled in later.
As I recall, it was three or four days until I had something that compiled. (But, it did help when we ported other games from the old version to the new one!)
The magnitude of the task will come down to what the code you are getting is like. If it has generic 3D calls that you can intercept -- add a thunking layer to -- then you are in business. It depends on the level of abstraction in the code. If it is well-behaved and has things like "RenderModel" and "RenderWorld" calls, you can replace those functions, and even the structures that they work with. If drawing is occurring all over the place, and calls are more like "Draw Polygon" and "Draw Line" or "Draw using this highly optimised data structure", then you are likely in for a long slog.
You shouldn't need a Wii dev kit. Sometimes it is nice to verify that the code you are given does indeed compile in the original environment (and matches the shipping code!), but sometimes you can just take it on faith and make it work in its new environment.
Lastly, I don't think the Wii uses OpenGL, and I really don't know where to point you for further help.
What you may want to do is to start with designing the architecture of the game, write up a detailed specification for what the new game is like.
Once you have this, since you will be rewriting the code, you may find that some of the business logic that doesn't deal with the console can be ported over. But, anything dealing with I/O, user interaction or graphics/sounds will be rewritten, so you might as well do that from scratch.
A specification is very important, to make certain that you know how the current game is working so that the new port will give the same user experience, if that is what is desired.
You may want to keep the same bugs, if that is part of the experience, as, if I know that in the Wii I can jump down and bounce off the wall to safely land, then if I can't do that in the new version then that may be bothersome.
Well porting a PS1 game to an iPhone would be quite a task they work in very different ways. I'm sure its doable but it will be a LOT of work to replace all the fixed point maths and lack of Z-Buffer based rendering to a real graphics chip.
Wii would be a lot easier. The Wii API is very similar to OpenGL. However the Wii has some very nice fixed function features that just are not available on any other GL based platform. Should be doable, though ...
I'm not really sure I can say anything more than that. Have signed far too many NDAs over the years to be 100% sure of what I can and cannot say ;)
Still if you want to hire someone to do some porting work and are prepared to supply the required hardware then I might be free ;)