SMP MiniGL Layer Beta

SMP MiniGL Layer Beta

Last updated September 21, 2023

If you thought there would never be Quake3 SMP support for 3dfx Voodoo based cards, think again. This project, which is simply called a SMP MiniGL Layer for now, aims to provide SMP rendering acceleration for these video cards and any other video cards whose drivers have problems with Quake3's built-in r_smp mode.

This first beta release of the SMP MiniGL Layer is intended for testing and performance analysis on non-hardware T&L video cards such as 3dfx Voodoo series cards, and NVIDIA TNT and TNT2 series cards. Although it may function with newer hardware T&L video cards, performance is likely to be poor (probably lower rather than higher). It's called a MiniGL layer because it does not handle a large number of OpenGL calls. This first beta release will work with a large number of Quake3 levels, but still does not handle all calls made by Quake3. It has also been tested a little bit with the RtCW MP Test.

If you have a SMP system running a Windows OS, a non-hardware T&L video card, Quake3 or the RtCW MP Test, and are interested in testing a first beta release of this SMP rendering acceleration software, then keep reading. Although NVIDIA TNT and TNT2 series video cards have drivers that work with Quake3's built-in r_smp mode, if you have one of these cards and would like to test this first beta release, I'd like to know how it's performance and stability compares to Quake3's built-in r_smp mode across various system configurations with these video cards.

News

9/21/2023
Source code for the project: mtgl_src.2007_05_30.zip (191 KB). This is source code for a later version of the project than used to build the beta releases. It can be used without any restrictions from me. The 2007_05_30 source code was last built with MSVC6, SP5, with the Visual C++ 6.0 Processor Pack installed.

Known issues
The PerfMon window may not work due to a missing call to InitCommonControls(), or with comctl32.dll version 6.0 or later it looks like would need to call InitCommonControlsEx() with the ICC_LISTVIEW_CLASSES flag.

Notes about using a newer compiler
I was able to build the 2007_05_30 source code with a newer compiler after making a few small changes, and it passed minimal testing. I used Visual C++ 2005 Express Edition and Microsoft Platform SDK 2003 SP1. The project contains a resource script. The Express Edition of the compiler does not come with include files needed by the resource script, but the Platform SDK does have them. The Express Edition of the compiler does not come with a resource editor, but it does have a resource compiler needed to compile the resource script.

There were some problems with the fast command queue pointer assembly code, which is present if both ENABLE_MULTIPLE_CONTEXT_COMMAND_QUEUE and USE_FAST_COMMAND_QUEUE_POINTER_ASSEMBLY_CODE are defined. The fast command queue pointer assembly code patches a large number of functions and depends on the compiler generating specific code that is compatible with the patching. Disabling the linker COMDAT Folding option was a simple way to avoid problems with one of the functions. For one other function where different code was generated, I quickly worked around the problem by replacing COMMAND_QUEUE_POINTER with GetCommandQueuePointer() and removing it from the list of functions to patch.

9/29/2002
I've been very busy with work lately, but I finally got around to putting up a page for the enhanced UT OpenGL renderer I had put together. It's a bit faster than the v4.36 one in a number of areas and supports some new options. Here's link to the page. It still has some debug messages that I wanted to remove, but haven't had time to take care of yet. They're fairly harmless though and only print out a bunch of stuff during startup if a debug viewer is watching (so in most cases, it will never be noticed anyway).

I've been waiting for NVIDIA to post information about the new GL_NV_pixel_data_range extension. This could be very useful for texture uploading in the SMP MiniGL Layer. I probably still won't be able to get decent performance on modern hardware T&L video cards, but I'd still like to experiment with this one. Now if they only had support for fences that worked across multiple threads...

6/16/2002 - Beta 3 Released
SMP MiniGL Layer Beta 3 has been released. This release adds SSE memory streaming instructions to some code paths. If you have already tried the first or second beta release, this one installs the same way. To enable the SSE code paths, uncomment the "UseSSE = 1" line in the [CPU] section of the included mt_gl.ini file. Processors that support SSE instructions are required of course. The link to the beta 3 package is in the [Download] section.

Send any feedback and comparative benchmarks to smpdev@cwdohnal.com.

4/17/2002 - For anyone who still plays UT and uses the OpenGL renderer
I added a few tweaks to the UT OpenGL renderer. It can be downloaded from the utglr directory if you're interested in trying it out. It should run a little faster and I'll write up a short page about a couple of new options I added when I get a chance. The OpenGL renderer is in the utglr05.zip file. The source is in the utglr05src.zip file. Note that this is not the full source needed to build it, but only a copy of the source from the OpenGLDrv directory that includes my modifications and the 4.36 modifications. You'll need to get the rest of the UT public source from Epic for a complete build environment.

Send any feedback to smpdev@cwdohnal.com.

I ran a few SMP MiniGL Layer benchmarks with UT and the tweaked OpenGL renderer, but the results are pretty much like before. I can get marginal speedups on my dual P3 800 with a GeForce2 GTS on a number of benchmarks when using some newer experimental code that isn't completely stable. The only good speedups I ever got were on only a few specific demos on an old dual Celeron with a TNT2. If I didn't mess up anything, this tweaked OpenGL render by itself should be good for any system, single or dual, though.

12/2/2001 - Beta 2 Released
SMP MiniGL Layer Beta 2 has been released. This release is able to properly handle mirrors in Quake3 and it improves performance with a more efficient queuing architecture. If you have already tried the first beta release, this one installs the same way. The link to the beta 2 package is in the [Download] section. I'd recommend switching back to windowed mode if possible before installing this package for this first time just in case anything goes wrong. If you have not tried the first beta release, make sure to read all of the directions before trying out the new beta 2 release.

Send any feedback and comparative benchmarks to smpdev@cwdohnal.com.

This second beta release also handles a lot more OpenGL functionality than needed to just be a simple MiniGL layer for Quake3. If you are interested in helping me research the possibility of trying to graft SMP rendering acceleration onto other games that use OpenGL, send me an email and I can provide you with instructions needed to try to make it work with other applications. The installation process can be a little more complicated and there are various potential loading related problems that can pop up. I've done some work towards making the SMP MiniGL Layer work with the UT OpenGL renderer. It's had mixed results so far. Although I can get good performance increases on a few benchmarks, results are not so good on others.

11/28/2001
I'm going to try to get the next beta release out either this weekend or next. It will contain a much more efficient queuing architecture and a few faster code paths. It will also handle a much larger number of OpenGL calls and extensions. This opens up the possibility of trying it with other applications.

AGP and video memory benchmark released
The first version of a simple benchmark that can test CPU to AGP write performance has been released. It only works with GeForce class video cards because of the NVIDIA specific OpenGL extension it uses to allocate AGP and video memory. If you have a VP6 with two CPUs, I'm interested in any odd numbers you may get when running the AGP memory benchmark on CPU 0 and then on CPU 1. I think there may be a SMP related bug somewhere and would like to research this some more. If you have AGP Fast Writes enabled, the video memory benchmark can tell you how well they're really working on your chipset. On some chipsets, including mine, AGP Fast Writes don't work so well. As this is a feature I was thinking of using at one time for SMP MiniGL Layer optimizations, I'm interested in knowing if any chipsets out there actually have a good AGP Fast Writes implementation.

An initial v0.90 release of this benchmark program is available on the VAR Memory Benchmark page.

It may be possible to increase the performance of this first beta release by increasing the queue size. Try increasing the Size setting in the [Queue] section of mt_gl.ini to 2000000 (two million) from the default 300000 (don't use any commas in the .ini file). More efficient streaming code may make it into a future release. It should be able to fix some of the problems that prevent smaller queue sizes from performing well with the current queue architecture. I'm not sure how much of a difference this change will make, but if you do run a benchmark with both settings, please send in both results so I can see how they compare.

I'm working on putting together a simple benchmark program that can test AGP Fast Write performance on systems with NVIDIA GeForce class cards (It requires the GL_NV_vertex_array_range extension). This will program will be able to show you how fast Fast Writes really are on your system. On my VP6/GeForce2 combo, Fast Writes at both 2X and 4X are fast kind of like the ViRGE was fast at 3D. Without Fast Writes, PCI mode CPU to AGP write performance is fairly poor.

I ran into a crazy problem while working on this benchmark. If I do continuous writes to AGP memory from CPU 0, everything is okay, but if I do continuous writes to AGP memory from CPU 1, it stops my system clock and makes it lose ticks. A lot of CPU intensive benchmarks that freeze the system while running will pause the clock or make it update slow, but once it gets some CPU time, it catches up. I'm seeing the clock stop when I start the benchmark and not catch up when it finishes it in this one case. That's not good (and it messes up my MB/s calculations). I've noticed what looks like an inconsistency in how the MTRRs for the AGP aperture are set up across my two CPUs. Perhaps this has something to do with it. Maybe I should get the latest BIOS instead of the beta YT I'm running now, but I think the inconsistencies I'm seeing may have been there with some previous BIOSes too. I'm not entirely sure the BIOS is responsible for setting this stuff up, but there is a good chance it is.

It is the first beta...
While this software has gone through a significant amount of testing, keep in mind that this is the first beta release. If you do decide to try it, be ready for a potential system crashes and/or the need to use the reset button because of a video subsystem possibly getting stuck in a bad state. Installation is not all that complicated, but it does require a number of steps. Be ready to copy a few files, make backup copies of config files, enter some Quake3 console commands, and enter a couple of things in a .ini file based on your system configuration.

How it works
The basic idea is to divide work across two processors by letting the application (Quake3 in the case) run on one processor and letting the OpenGL driver run on a second processor. There is some overhead in queuing and synchronizing data, but if the benefit of running things in parallel outweighs the cost of the overhead, SMP rendering acceleration is possible. If the overhead is too high or if dividing the work at this point does not cause it to be divided very evenly, then it may cause lower performance. This means that this technology, by design, will only work properly in certain scenarios. Like Quake3's built in r_smp mode, this project cannot provide SMP rendering acceleration when video card fill rate rather than CPU speed becomes the limiting factor. Unlike Quake3's built in SMP mode, this layer doesn't require proper multiple thread/multiple rendering context support in the OpenGL driver (which some broken or incomplete OpenGL and MiniGL drivers do not provide).

Current test status
The SMP MiniGL Layer has been successfully tested and benchmarked in systems with a single Voodoo2, a Voodoo2 SLI, and a TNT2 series video card. This first beta release has only been tested on Win2k, but there is a good chance that it will run okay on WinNT 4 or WinXP.

Beta 1 goals
- Ensure that this project works correctly with newer 3dfx Voodoo series video cards. While I don't expect any major problems as the project works okay with various Voodoo2 OpenGL drivers and OpenGL driver for a few other video cards, it needs to be tested with these video cards to know for sure.
- Performance testing on newer Voodoo series video cards. This project only works well with non-hardware T&L video cards. Systems with newer 3dfx Voodoo series video cards are expected to have the best chance of benefiting from this project right now.
- Performance and stability comparisons on SMP system with NVIDIA TNT and TNT2 series video cards if there are enough people who still have these older system configurations.
- Testing and benchmarking with other non-hardware T&L video cards if there is any interest.

Performance expectations
One of the goals of this beta release is to get a better idea of how it performs across various system configurations. Limited internal benchmarking shows that the following general scenarios are possible.
- If fill rate bottlenecks are not a major factor, performance improvements of 30%+ should be possible on a number of common Quake3 demos.
- As fill rate rather than CPU speed bottlenecks become dominant, the attainable performance improvement will decrease.
- In cases where fill rate is the bottleneck throughout the entire demo, there will be no performance increase or perhaps a slight performance decrease. This means this project will not be able to benefit system configurations with sufficiently fast CPUs coupled with sufficiently slow video cards. Video resolution makes a big difference in determining where this cutoff occurs.
- There are some known and probably some as yet unknown bad cases for this technology where it may decrease performance significantly even when used with non-hardware T&L video cards.

Download information
All documentation is currently on this page that you're reading right now. First, save a copy of it for quick reference if needed.
Next, read the following terms and conditions, and if you agree to them, download the beta 1 package.
The files contained in this beta package are provided as is.
They are believed to be virus free (but you should scan them anyway).
Do not publicly redistribute this beta package.
It's a first beta release. There are some known issues, but currently no serious known issues. Installing this beta could lead to crashes and the need to reset your system. Be prepared for this (it's safest if you save anything you might have open and restart your system before installing this beta).
You have read and understand the note to overclockers.
You have read and understand the following pre-installation/uninstallation and installation instructions.
Feedback from this first beta release is important. If you do decide to try it, please send a report back with information about how well it worked and if any problems were encountered.
Beta 1 package: smp_minigl_layer_beta1.zip (45 KB).
Beta 2 package: smp_minigl_layer_beta2.zip (51 KB).
Beta 3 package: smp_minigl_layer_beta3.zip (52 KB).

Contact information
If you have any question or comments about this beta release, send me an email at smpdev@cwdohnal.com

Feedback from this first beta release is important. While a detailed set of benchmarks is nice, I don't expect that everyone will be interested in running and sending these in. If you just want to send in a quick report about how it worked on your system, send any performance observations and a little bit of system config information to smpdev@cwdohnal.com and put something like "SMP MiniGL Layer Beta 1 test report" in the subject line. If you have any questions and would like a reply, you should still send an email to smpdev@cwdohnal.com and just make sure to use a different subject line.

Note to overclockers
As you should already know, you are responsible for any damage that might occur from running your hardware out of specification. While it is unlikely that using this beta release could cause any permanent hardware damage even on a highly overclocked system that otherwise works okay, it has the potential to push certain parts of a multiprocessor system harder than a lot of other software. If you have a system with certain components running on the edge, it may be necessary to reduce their speed some to run this beta release reliably. This beta release may do a lot more interprocessor data transfer combined with increased main memory accesses compared to many other applications you have run on your system. If it works right, it will speed up video performance too, which means that there will be more bus traffic going to the video card and more stress on the video card. Keep these things in mind before installing this beta release and take appropriate action before installing it if necessary. Also, I'd prefer that reports of any instability problems are not the result of hardware running out of specification. So, if you have trouble with it crashing on a highly overclocked system, please try it out at a lower clock speed before making a final report about the issue.

Pre-installation/Uninstallation instructions
In case something goes wrong, a backup of your Quake3 config file will allow for easy uninstallation.
For Quake3, go to the baseq3 subdirectory and make a backup copy of your q3config.cfg file.
For the RtCW MP Test, go to the demomain subdirectory and make a backup copy of your wolfconfig.cfg file.
If there is an installation problem, or if other uninstallation methods fail, restoring the backup copies of these config files should restore things to the previous working state. The uninstallation can then be completed by removing the two files installed from the beta package (though at this point, restoring the config file should have already disabled it and returned your configuration to a previously working state).

Installation instructions
1. Put the two files, mtgl32.dll and mt_gl.ini, from the beta package in your Quake3 directory. To install this beta with the RtCW MP Test, just put these files in your Wolf MP Test directory instead. Besides a few naming difference that are either specifically mentioned or should be easy to figure out, the installation instructions for the RtCW MP Test are the same as for Quake3.

2. Switch Quake3 to a windowed video mode if possible. If anything goes wrong during the installation process, having it running windowed will make it much easier to see what went wrong and to fix it.

If you have a Voodoo2 or Voodoo Graphics based board hooked up to a monitor with multiple inputs instead of using the passthrough, this will also make it easier to recover from any errors that occur. If you don't have an easy way to switch between video outputs and something does go wrong that leaves you with a blank gray screen, try pressing enter one or two times to close any error message boxes that you may not be able to see. This might get you back to a point where control of the system can be regained instead of having to reset it.

Since error recovery with full screen only 3D cards like the Voodoo2 and Voodoo Graphics is more complicated, if something doesn't look quite right during the install and setup process, feel free to send an email with any details and questions to smpdev@cwdohnal.com. If you have trouble getting the SMP MiniGL Layer up and running on a Voodoo2 or Voodoo Graphics based board and are interested in spending some time debugging it, send me an email and I can help you with getting it running windowed with WinGlide. If you know how to do this already, that would be great, but getting WinGlide working with 3dfx's newer OpenGL drivers (the ones that work with Quake3 and use glide3x.dll) is a real pain.

3. Open the mt_gl.ini file in a text editor. Depending on your system configuration, you may not need to make changes, but it's still good to see what's there. The OpenGL32 parameter in the [Main] section must specify the name of the OpenGL driver to use.

For most video cards, which use an ICD, the default settings of "opengl32.dll" should work. The DoGDICalls parameter should be set to "1" if the OpenGL32 parameter is set to "opengl32.dll".
In the [Main] section of mt_gl.ini
OpenGL32=opengl32.dll
DoGDICalls=1

For Voodoo2 or Voodoo Graphics video cards, which use a standalone OpenGL driver, the OpenGL32 parameter needs to be set to "3dfxvgl.dll" and the DoGDICalls parameter must be set to "0".
In the [Main] section of mt_gl.ini
OpenGL32=3dfxvgl.dll
DoGDICalls=0

4. Disable Quake3's built in SMP mode if necessary. The console command "r_smp 0" will do this. This DLL layer only works when Quake3's built in SMP mode is disabled.

5. Exit Quake3 now before continuing.

6. Start Quake3 again and bring down the console.

7. Enter the command "r_gldriver" in the console to see what the current OpenGL driver is set to. It should be set to "opengl32" for most video cards. If you have a Voodoo2 or Voodoo Graphics video card, it should be set to "3dfxvgl". If it is not set to either of these values, stop here and send me an email with the details.

8. Enter the command "r_gldriver mtgl32" to tell Quake3 to use the SMP MiniGL Layer. To make this take effect, enter the command "vid_restart" at the console, or exit and restart Quake3.

9. If you need to switch back to not using the SMP MiniGL Layer, enter the command "r_gldriver opengl32" or "r_gldriver 3dfxvgl" based on what it was before to select the driver you were using before. If you can't run Quake3 to make this change, restore your config file as described in the Pre-installation/Uninstallation instructions.

Beta 1 feedback requests
Basic did it work or did it not work information. Any installation or configuration difficulties.

System information.
CPUs:
Chipset:
Memory:
Video card:

Benchmark comparisons between Quake3 r_smp 0 and Quake3 using the SMP MiniGL Layer with a standard built-in Quake3 demo. Feel free to include benchmark comparisons from a couple of other demos if you'd like. Benchmark comparisons from the same demo at two or three different resolutions would be useful. A low resolution benchmark at 640x480 or 512x384 will be useful for evaluating the efficiency of the SMP MiniGL Layer on your system as it will reduce fill rate bottlenecks. A benchmark at the resolution you usually play at will be useful for seeing how much of a real world performance improvement, if any, it can provide on your system using your preferred settings. If you try it with the RtCW MP test and don't happen to have a benchmark handy, a rough estimate from the frame rate counter at approximately the same map position will work too (use the cg_drawfps setting from the console to turn the frame rate display on or off).

Benchmark information.
Demo:
Resolution:
Bit depth:
r_smp 0 frame rate:
SMP MiniGL Layer frame rate:

Any other interesting observations. Smoothness, feel, does it seem laggy or out of sync in any way, etc.

All Beta 1 feedback information needs to be sent by email.
Send SMP MinGL Layer Beta 1 feedback to smpdev@cwdohnal.com