Archive

Posts Tagged ‘code’

Sharif CGS 2010 Presentations

April 23rd, 2010

(Well, one of them for now…)
I thought I’d post the presentation I used for my talk in the conference on game development held at Sharif University a couple of months back.
It’s about the decisions you have to make and the things that you should do at the beginning of a game project, to make your team’s and your own lives easier later on and throughout the development cycle. This is an ongoing experience and collection of ideas for me, so I’ll be looking forward to any suggestions, discussions, critique, humiliation, praise and/or whatnot!

VN:F [1.9.11_1134]
Rating: 8.6/10 (35 votes cast)
VN:F [1.9.11_1134]
Rating: +7 (from 9 votes)

Run Please! (or how do we enjoy a cup of coffee while the bugs are being chased.)

March 26th, 2010

In the past weeks, a very common sight at Fanafzar was a series of 4 or 5 machines, all running Garshasp on a pre-recorded command sequence (or timedemo, or whatever you might want to call it,) trying to get the game to crash or fire an assertion or behave erratically to help us pinpoint some intermittent or hard to reproduce bug.

First of all, we have a quite cool replay feature in Zorvan (our engine) that lets us record and then play back a game session. It’s not perfect, and not quite fit for end-users, but you wouldn’t imagine how useful it has been (and will be) to us.

This replay system is serving as our unit test (“You added that feature? Let’s see if the boat sequence is playable now.”) and our regression test (“You committed that fix? Let’s see if the game is still playable!!!”) and our performance test and playability test and much more. Since the structure of our game is linear by design, we can get very good coverage with a straightforward replaying of the entire game.

In any case, since we are almost feature-frozen now, our (programmer’s) lives mostly consist of running the game till it crashes (or does something it shouldn’t do) and then tracking down the bug and working it out.

I guess the next step will be finding the few major performance bottlenecks and optimizing them (that we have put off till now because they would have made debugging and adding features quite hard.)

My point in all this was that debugging is usually considered a gruesome and intimidating task, or boring and uninteresting at best. Right now, I quite enjoy debugging our engine and game for two main reasons: first is the replay system (which makes debugging much more effective, targeted and efficient) and second and more important is finding out bugs in my own (and our own) mental processes by finding bugs in the code that resulted from those processes.

It can be illuminating to find out what you had missed when you designed or implemented a piece of code, or the bugs caused by lack of communication or a problem in the general work flow (these are not common, but interesting nonetheless.) This form of revelation that results from finding a bug in your code is quite a rush and can make us better programmers.

VN:F [1.9.11_1134]
Rating: 8.8/10 (40 votes cast)
VN:F [1.9.11_1134]
Rating: +9 (from 15 votes)

Time, Only Time

May 9th, 2009

Keeping track of time is a sensitive problem. Designing a good system for this is in many ways one of the most basic and most crucial tasks a game engine developer has. One of those reasons is that most (if not all) game engines are simulation engines at their core, and without proper treatment and handling of time, nothing much else can be handled and treated.

Anyways, I’m not going to talk about the design of the time system that we have implemented in Zorvan (that’s what we call our game engine) but about its implementation, and not about the whole implementation, but rather about how we actually read time from the system, and the pitfalls and problems associated with it.

Basically, when writing your programs in C, and on Windows on a PC, you have a range of options for reading the absolute time. The basis of this absolute time is not important because we’ll be only working with time deltas and almost never with the value of the time itself. Let’s call each source of such an absolute time a “time source”(!)

There are a few parameters that we should be concerned about in a time source. The first is precision, or the frequency or the smallest time value that a source can actually measure or the number of meaningful digits in its return value. The next one is update interval. For example, a time source may advertise that it can measure time in micro- or nano-seconds, but its value may only change (or be updated) once a millisecond. The third parameter is the overhead of actually reading time from that source. If a source writes data to a disk file only and you have to read it from there, it’s gonna be no good for you whether it measures time in femto-seconds or not. You are going to spends a few of milliseconds reading that number anyway (if not more) and it won’t be any good then.

The two last parameters can be combined, but I thought I’d make a distinction because they are indeed different to a programmer and they present different symptoms in the application.

OK, enough with the pep talk. Let’s get down to business. The most obvious time sources are the CRT clock(), the Win32 API’s GetTickCount() and Win32 Multimedia API’s timeGetTime(). These are all millisecond sources. That is, they all make you think they have a millisecond precision. On Windows, the first two actually have an awful update interval of 15-16 milliseconds (any old DOS programmer should find this number painfully nostalgic!) Of course, this number is not fixed and you should not assume it, but I don’t remember if I’ve encountered any different behaviour by these two calls in recent years. The multimedia timer actually does have a millisecond precision and it does indeed get updated every actual millisecond, but the API documentation asserts that you cannot count on that either (but you can ask the OS to make an effort to update this timer value in any multiples of milliseconds that you want.) Among these three, the multimedia timer has the lowest call overhead and is obviously the best of them.

In any case, the millisecond precision is not enough for most of the timing needs of a game engine, but it may be enough for the most important usage of measuring the frame time. For games that have a framerate of below 100, this should work out well enough for now, but you should think about future too. In general, don’t use this to measure any duration less that a second in a game.

The next time source on Windows, with more precision and supposedly better behavior is the high performance timer (or whatever) aka QueryPerformanceCounter(). This allegedly uses one of the hardware counters on your system and produces close to microsecond (or a little better) precision. However, the frequency of this time source is not fixed from system to system and you have to query it from the OS (using QueryPerformanceFrequency()) at runtime, but it is guaranteed not to change when the computer is up and running (I don’t know what happens across system hibernations, and frankly, I don’t want to find out!) On my system, and a few others I’ve tested the frequency of this source is 3’579’545 ticks per second, which gives it a precision of 280 nanoseconds. However, as my tests show, each invocation of this function takes about 2 microseconds (on my test system) which makes the overhead and latency rather high compared to its precision. However, for timing frames and other non-critical code (out of your precious inner loops) this is probably your best bet.

One largely curious (not to mention disturbing) behavior we observed recently in Zorvan while using this method for time calculations was that it returned the same number in two invocations a few milliseconds apart which resulted in all sorts of erratic behaviors, but I haven’t had time to investigate it in depth and I haven’t been able to reproduce it again.

Perhaps the most pervasive method for obtaining time in game engines is using the rdtsc (Read Time-Stamp Counter) x86 instruction which has been available since the era of 486 CPUs. It has no parameters and returns the number of CPU cycles past since the CPU was restarted as a 64-bit number in EDX:EAX. The instruction is lightweight and low-overhead, has a very high precision (almost the highest precision possible, because any lapse of time smaller that the CPU clock cycle is hardly meaningful or measurable in general computing) and is available everywhere (on all PCs anyway, and it’s obviously not Windows-specific.)

For those who are afraid on inline assembly, there is even a convenient intrinsic available in Visual C++ (include “intrin.h” and call __rdtsc(). (The GCC inline assembly call is left as an exercise for the reader!)

But nothing is free. There are a few problems attached to the use of rdtsc which fall mostly in two categories: problems caused by multi-CPU systems and those that result from its interaction with CPU power saving schemes.

In multi-CPU systems (including multi-cores,) the different CPUs may not have started counting cycles at exactly the same time which causes the results read from different cores slightly different. This can happen very easily because normally your code can run on different CPUs at different times (across task switches) and can cause the time seen by the application to appear to go backwards (imagine trying to feed a negative time delta to your physics solver!) The first time I saw this was on an AMD Athlon 64 X2. Although the problem is solvable with an official patch from AMD, it has left me always afraid from having to encounter it again! In any case, I haven’t seen this problem on Intel Core architecture CPUs and I haven’t had access to an AMD Phenom or Opteron to test.

The second group of problems happen because modern CPUs may vary and adapt their clock rates at runtime to different loads (this is quite common on mobile-class CPUs specially.) This means than when you measure your CPU clock rate (e.g. at the start of your application) it may not stay the same during the lifetime of your application and may go down or up (if the initial load was low.) In either case, it will wreak havoc on your time calculations.

But don’t despair! While solving the first issue is rather ugly in applications (you have to bind your time-reading thread to a single CPU (it’s called setting the “CPU affinity” for that thread; STFW yourselves)) and solving the second problem is impractical in application code, the second category of problems can be solved rather easily (IMHO) in the CPU itself. It just has to always report the time-stamp counter value according to the highest clock rate. And I suspect that CPUs actually do this, because I haven’t encountered problems of the second category yet and it’s only a theoretical problem for me for the time being. I may just be lucky, but I don’t believe so!

Anyway, for your reference and purposes of comparison, I have measured some relevant timing values for the 5 time sources discussed above on my laptop (a T7200 CPU, i.e Inter Core 2 Duo, 2GHz) which is presented in the table below:

Time Source Call Overhead (microseconds) Minimum Value Jump In Two Successive Calls Frequency (Hz) Precision (MHz-1)
clock() 0.03822 15 1000 1000
GetTickCount() 0.00313 15 1000 1000
timeGetTime() 0.02026 1 1000 1000
QueryPerformanceCounter() 1.921 5 3579545 0.279365
rdtsc 0.000516 60 2000000000 0.0005

*: Each number is measured and averaged over about 100 million iterations.
There are of course a couple of more methods of reading time from other hardware sources, but their availability and parameters are rather system dependent and I won’t go into the topic anymore.

VN:F [1.9.11_1134]
Rating: 9.0/10 (15 votes cast)
VN:F [1.9.11_1134]
Rating: +3 (from 3 votes)
Author: Categories: Engine Tags: , ,

Behavior Tool – Seamlessness

April 27th, 2009

We have felt the need for a specific tool in our Editor for so long now. Something to enable us to focus on a specific character and tweak its properties. Whether its behaviors, animaiton, physics or any other property. We never quite dedicated the time to implemente the features we needed. However some bursts of inspiration struck me recently when I went over the Havok Behavior Tool which is a very nice and neat product.

Our plan for this week is to implement a few features in our editor, Iranvij, which will enable us to view a character in an isolated window, run the different animations with a time tracker and simulate this character alone to view the different behaviors from different states. This will be a big addition to Iranvij. The state chart for every character can be edited visually thanks to the new data driven Hierarchical Finite State Machines used for the characters and the visual graph features in our editor.

Another major feature which is being added to the game engine is the Seamlessness feature which will enable huge game worlds to be loaded and unloaded on the fly without any load screens halting the game experience. The main parts have already been implemented in the game engine, Zorvan, and now the necessary tools have to be added to Iranvij. This is going to be a big addition to the engine capabilities.

VN:F [1.9.11_1134]
Rating: 5.5/10 (4 votes cast)
VN:F [1.9.11_1134]
Rating: 0 (from 0 votes)
Author: Categories: Editor, Engine Tags:

Render Pipeline Rewrite

April 26th, 2009

We’re in the process of redesigning and rewriting the CPU and GPU-side code for Garshasp’s rendering pipeline. What I’m thinking about (and what we more-or-less have already) is an HDR, per-pixel-lit rendering with (pretty slow) shadow maps.
The new pipeline that I’m thinking about is like this:

  1. Shadow Pass (once per shadow-casting light that affects the frustum):
    Render all shadow-casting geometry from the light view point and keep all the depth values in the render target.
    Hardware requirements: FP32 or (at least) FP16 texture support. I can pack the depth value in an A8R8G8B8 texture if this turns out to be a limiting requirement.
    Notes: OGRE handles this by default, and it probably does a better job than me (I never got the hang of all the different methods for PSM frustum calculations. :) )
  2. Depth pass:
    Do an initial rendering of all opaque geometry to initialize the Z-buffer and also write the view-space pixel depths to a floating-point texture (because we SM3.0-era PC-developers can’t read from the Z-buffer directly.)
    Hardware requirements: FP32 or (at least) FP16 texture support. However, I can pack the depth value in an A8R8G8B8 texture if this turns out to be a limiting requirement.
    Notes: I should investigate interactions of this with MSAA.
  3. Shadow Map Generation Pass:
    Render all shadow-receiving geometry and calculate whether each pixel is in shadow or not (and how much) using the information form the last two passes. The PCF and gang should be run here. If we decide to allow multiple shadow casting lights, we can do this calculation for four of them in this pass and write the result in different components of a single A8R8G8B8 render target. More will need MRT.
    Hardware Requirements: Nothing special.
    Notes: OGRE claims that it handles this by default. But it also provides an “Integrated Shadow” option which should give me more control (and more chances to mess things up!) I should think about whether I can integrate this pass with the last one. The only problems I see are the MRT prerequisite and the different render target bit-depth requirements (8 vs. 32.)
  4. Render Pass:
    Render the glorious scene including the translucent objects and objects that need special treatment (fog volumes, light volumes, water,) using the shadow map generated in the last pass and the depth values from the pass before that (for the special effects.)
    Hardware Requirements: Will probably need SM3.0 or SM2.x if we want to support more than one or two lights. This is preferred to turning this into multiple passes, because of the number of our triangles and the large number of animations (rather costly vertex programs.) Also, will require FP_ARGB_16 for the render target; I’m hoping that every SM2.x card supports this with ease.
    Notes:
  5. HDR Bloom:
    Very effective visually, although requires many, many passes (the number is also partly dependent on output resolution, because of the down-sampling.)
    Hardware Requirements: SM3.0 allows to dramatically reduce the number of required passes (from 10-15 to 4-5.) But I don’t think I have time to write two sets of shaders. Will have to see what our recommended requirements would be.
    Note: Other effects can be achieved while we are at it here (e.g. glow maps.)
  6. Tonemapping:
    In effect, this is integrated into the last pass, but it’s too important not to get its own pass!
    Hardware Requirements: Nothing special that I can see.
    Notes: I need to read more on this. The few methods I have tried give great results but only on certain situations.

The easiest way of putting all this together is with a many-pass compositor. Yet OGRE does not let me access the intermediate textures easily (but I’ve seen this in the OGRE compositor demo! How is that done?!) Maybe I will be forced to put them one after the other in code, or at least generate the compositors in code, which is beneficial any way (easier adaptation to hardware and user config, etc.)

More to come.

VN:F [1.9.11_1134]
Rating: 7.0/10 (3 votes cast)
VN:F [1.9.11_1134]
Rating: 0 (from 0 votes)
Author: Categories: Engine Tags: , ,