Garshasp and Arzoor

May 27th, 2009

Garshasp and Arzoor

The first concept art or it’s better to say the first illustration for the battle between garshasp and the monster named Arzoor.
I did it in photoshop and some final editings in Painter.
It actually a concept scene for the ingame cinematic shots.

VN:F [1.9.11_1134]
Rating: 9.4/10 (19 votes cast)
VN:F [1.9.11_1134]
Rating: +7 (from 7 votes)
Author: Categories: Concept Art Tags:

Parallel tasks

May 22nd, 2009

One feature which has been implemented in the Hierarchical State Machine we are using is the ability to support parallel states. Parallel states are usually drawn on a standard state chart diagram such as UML state charts inside one state, separated by dashed lines. A fork would be necessary to divide the incoming transition into two parallel tasks. Leaving any of these states would cause a transition from the parallel state.

An example for using parallel states in the AI of the main character would be a jump state where it would be possible to move forward during jump if the player presses the navigation keys. In this case, the jump itself is composed of a few sub states and a parallel state. Moving or not moving are the sub states for one of the parallel states. Pressing any navigation key would cause the player to move, change to moving state, not pressing causes a transition to stationary state. These change of states happen in parallel to the rest of the states related to jumping action.

This way it would be possible to add more dimensions to the behaviors of the PCs or NPCs.

Since we currently have a symbol definition system in order to store and evaluate variable values in the scripting system, it would be possible to simulate the join feature after a parallel state also. A join would mean that the process would stop continuing until both of the parallel sub states have made the exit transition. The two praralle states set a symbol upon exit, this join state would check for the two symbols, if they are both set, it would make the exit transition. Haven’t used this feature yet.

VN:F [1.9.11_1134]
Rating: 8.9/10 (8 votes cast)
VN:F [1.9.11_1134]
Rating: +3 (from 3 votes)
Author: Categories: Engine Tags: ,

The Resource Loop

May 16th, 2009

One major task for the development of the necessary features in the engine which is being worked on currently is the resource management section, as part of the Seamlessness feature which is being added to the game engine. Resources being 3D mesh files, textures, sound files, animation files and etc. The seamlessness design breaks down the world into chunks which will be loaded and unloaded in real time as the player progresses through the levels. This should ideally help out in increasing the rendering performance, memory consumption and general cpu load due to processes related to AI, physics and animation.

This feature needs to connect to the resource management hooks and start getting tested. It will probably be the last major architecturally significant feature we’ve planned for at this stage of development. The most critical resources are related to graphics and we’re currently trying to investigate the mechanisms OGRE provides for handling resources. One major feature which is definitely necessary for this design is the background resource load/unload which OGRE supports in a separate application thread.

Necessary tools to be integrated for Seamlessness and Resource Management are buing built into the level editor (Iranvij) in parallel.

Implementing this feature properly needs great focus from the technical and artistic departments since it affects the processes and main workflows of both.

VN:F [1.9.11_1134]
Rating: 7.3/10 (7 votes cast)
VN:F [1.9.11_1134]
Rating: +3 (from 3 votes)
Author: Categories: Editor, Engine Tags: , , ,

Khishma

May 12th, 2009

Khishma is the Daeva of wrath also known as Aeshma.

Concept Art:

Khishma

Khishma

Khishma 3D model:

Khishma 3D Model

VN:F [1.9.11_1134]
Rating: 9.1/10 (17 votes cast)
VN:F [1.9.11_1134]
Rating: +5 (from 5 votes)
Author: Categories: 3D character, Concept Art Tags: ,

Time, Only Time

May 9th, 2009

Keeping track of time is a sensitive problem. Designing a good system for this is in many ways one of the most basic and most crucial tasks a game engine developer has. One of those reasons is that most (if not all) game engines are simulation engines at their core, and without proper treatment and handling of time, nothing much else can be handled and treated.

Anyways, I’m not going to talk about the design of the time system that we have implemented in Zorvan (that’s what we call our game engine) but about its implementation, and not about the whole implementation, but rather about how we actually read time from the system, and the pitfalls and problems associated with it.

Basically, when writing your programs in C, and on Windows on a PC, you have a range of options for reading the absolute time. The basis of this absolute time is not important because we’ll be only working with time deltas and almost never with the value of the time itself. Let’s call each source of such an absolute time a “time source”(!)

There are a few parameters that we should be concerned about in a time source. The first is precision, or the frequency or the smallest time value that a source can actually measure or the number of meaningful digits in its return value. The next one is update interval. For example, a time source may advertise that it can measure time in micro- or nano-seconds, but its value may only change (or be updated) once a millisecond. The third parameter is the overhead of actually reading time from that source. If a source writes data to a disk file only and you have to read it from there, it’s gonna be no good for you whether it measures time in femto-seconds or not. You are going to spends a few of milliseconds reading that number anyway (if not more) and it won’t be any good then.

The two last parameters can be combined, but I thought I’d make a distinction because they are indeed different to a programmer and they present different symptoms in the application.

OK, enough with the pep talk. Let’s get down to business. The most obvious time sources are the CRT clock(), the Win32 API’s GetTickCount() and Win32 Multimedia API’s timeGetTime(). These are all millisecond sources. That is, they all make you think they have a millisecond precision. On Windows, the first two actually have an awful update interval of 15-16 milliseconds (any old DOS programmer should find this number painfully nostalgic!) Of course, this number is not fixed and you should not assume it, but I don’t remember if I’ve encountered any different behaviour by these two calls in recent years. The multimedia timer actually does have a millisecond precision and it does indeed get updated every actual millisecond, but the API documentation asserts that you cannot count on that either (but you can ask the OS to make an effort to update this timer value in any multiples of milliseconds that you want.) Among these three, the multimedia timer has the lowest call overhead and is obviously the best of them.

In any case, the millisecond precision is not enough for most of the timing needs of a game engine, but it may be enough for the most important usage of measuring the frame time. For games that have a framerate of below 100, this should work out well enough for now, but you should think about future too. In general, don’t use this to measure any duration less that a second in a game.

The next time source on Windows, with more precision and supposedly better behavior is the high performance timer (or whatever) aka QueryPerformanceCounter(). This allegedly uses one of the hardware counters on your system and produces close to microsecond (or a little better) precision. However, the frequency of this time source is not fixed from system to system and you have to query it from the OS (using QueryPerformanceFrequency()) at runtime, but it is guaranteed not to change when the computer is up and running (I don’t know what happens across system hibernations, and frankly, I don’t want to find out!) On my system, and a few others I’ve tested the frequency of this source is 3’579’545 ticks per second, which gives it a precision of 280 nanoseconds. However, as my tests show, each invocation of this function takes about 2 microseconds (on my test system) which makes the overhead and latency rather high compared to its precision. However, for timing frames and other non-critical code (out of your precious inner loops) this is probably your best bet.

One largely curious (not to mention disturbing) behavior we observed recently in Zorvan while using this method for time calculations was that it returned the same number in two invocations a few milliseconds apart which resulted in all sorts of erratic behaviors, but I haven’t had time to investigate it in depth and I haven’t been able to reproduce it again.

Perhaps the most pervasive method for obtaining time in game engines is using the rdtsc (Read Time-Stamp Counter) x86 instruction which has been available since the era of 486 CPUs. It has no parameters and returns the number of CPU cycles past since the CPU was restarted as a 64-bit number in EDX:EAX. The instruction is lightweight and low-overhead, has a very high precision (almost the highest precision possible, because any lapse of time smaller that the CPU clock cycle is hardly meaningful or measurable in general computing) and is available everywhere (on all PCs anyway, and it’s obviously not Windows-specific.)

For those who are afraid on inline assembly, there is even a convenient intrinsic available in Visual C++ (include “intrin.h” and call __rdtsc(). (The GCC inline assembly call is left as an exercise for the reader!)

But nothing is free. There are a few problems attached to the use of rdtsc which fall mostly in two categories: problems caused by multi-CPU systems and those that result from its interaction with CPU power saving schemes.

In multi-CPU systems (including multi-cores,) the different CPUs may not have started counting cycles at exactly the same time which causes the results read from different cores slightly different. This can happen very easily because normally your code can run on different CPUs at different times (across task switches) and can cause the time seen by the application to appear to go backwards (imagine trying to feed a negative time delta to your physics solver!) The first time I saw this was on an AMD Athlon 64 X2. Although the problem is solvable with an official patch from AMD, it has left me always afraid from having to encounter it again! In any case, I haven’t seen this problem on Intel Core architecture CPUs and I haven’t had access to an AMD Phenom or Opteron to test.

The second group of problems happen because modern CPUs may vary and adapt their clock rates at runtime to different loads (this is quite common on mobile-class CPUs specially.) This means than when you measure your CPU clock rate (e.g. at the start of your application) it may not stay the same during the lifetime of your application and may go down or up (if the initial load was low.) In either case, it will wreak havoc on your time calculations.

But don’t despair! While solving the first issue is rather ugly in applications (you have to bind your time-reading thread to a single CPU (it’s called setting the “CPU affinity” for that thread; STFW yourselves)) and solving the second problem is impractical in application code, the second category of problems can be solved rather easily (IMHO) in the CPU itself. It just has to always report the time-stamp counter value according to the highest clock rate. And I suspect that CPUs actually do this, because I haven’t encountered problems of the second category yet and it’s only a theoretical problem for me for the time being. I may just be lucky, but I don’t believe so!

Anyway, for your reference and purposes of comparison, I have measured some relevant timing values for the 5 time sources discussed above on my laptop (a T7200 CPU, i.e Inter Core 2 Duo, 2GHz) which is presented in the table below:

Time Source Call Overhead (microseconds) Minimum Value Jump In Two Successive Calls Frequency (Hz) Precision (MHz-1)
clock() 0.03822 15 1000 1000
GetTickCount() 0.00313 15 1000 1000
timeGetTime() 0.02026 1 1000 1000
QueryPerformanceCounter() 1.921 5 3579545 0.279365
rdtsc 0.000516 60 2000000000 0.0005

*: Each number is measured and averaged over about 100 million iterations.
There are of course a couple of more methods of reading time from other hardware sources, but their availability and parameters are rather system dependent and I won’t go into the topic anymore.

VN:F [1.9.11_1134]
Rating: 9.0/10 (15 votes cast)
VN:F [1.9.11_1134]
Rating: +3 (from 3 votes)
Author: Categories: Engine Tags: , ,

Brains

May 9th, 2009

For the main decision making process of the AI in the game, we use the good old state machines. Very recently we tried to make these state machines data driven so that it could be tweaked by the designers easily and provide us with much more flexibility.

The grand design was borrowed from an interesting article in AI Game Programming Wisdom4 about hierarchical dynamic state charts. It has been implemented now in the engine and integrated with the in house scripting system from the top layer and the core Game Object component system from the bottom layer. All NPCs benefit from the data driven state machines now which use a basic XML file for their configurations. This xml definition is the standard SCXML used for defining state charts, like the UML state charts. A graphical tool has been implemented in Iranvij (the world editor) as part of the Behavior Tools we had planned to build. The state chart nodes can be created and connected using this graphical tool and once the architecture for the state chart is set, events can be set on the transitions and scripts added to the state entry/exit functions.

The functionality seems to work fine currently, once concern is performance as usual since this decoupled flexible design is not necessarily in line with improving performance (is there any design goal for software in line with improving performance at all?). For this we would have to profile a busy scene and really find out.

This new design transfers a lot of game logic outside the main engine code and into the scripting system. The scripting system we use currently requires simple commands to be developed in the engine and called from the script, we have to see how far we can go with this design, we will eventually either extend our scripting features or switch to a standard scripting language such as Lua or Angelscript and integrated that with the engine code.

The only dude in the game which is not data driven as far as the AI currently is the infamous Garshasp himself. Once we’re sure everything is going good with the new design, Garshasp will delegate its behavior definitions to a few data files outside the executable binary.

VN:F [1.9.11_1134]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.11_1134]
Rating: +1 (from 1 vote)
Author: Categories: Engine Tags: ,

Arzoor, a 3D character pipeline

May 6th, 2009

Here is a Construction shot along with the Arzoor Zbrush model and a Beauty shot, just to show the technical part of our character art pipeline.

click to see a it in 3 different angles.

arzoor3dmodel_wf_low

This construction shot is an screen shot from max viewport. We use ShaderFx for testing normal maps inside Max during development of the character and before importing it to the engine. this is a shader I have made that supports 4 realtime lights and an overall ambient light. maps to use with this shader are : Normal,Diffuse,Specular and alpha.

arzoor3dmodel_zb_bs_low

Almost every character in the game take the same path to come alive (aside from animation though), let me explain it. the model will be first modeled in Zbrush, then it will be retopologized inside Zbrush for making the low poly mesh from the original sculpt. then every parts of the low res mesh will be taken to 3ds Max for Unwrapping and normal baking. right now we bake our normals inside max. In 3ds Max  environment we have the ability to easily adjust the cage mesh for an accurate normal map generation and easy bug fixes. But the problem arises when your high resolution Zbrush mesh goes really HIGH!! at this moment we usually brake the high res mesh according to low res unwrapped chunks. But almost recently 3mm has proposed to use xnormal for baking the normal maps as it is an stand alone software just for normal baking purposes (and of course some other great tasks). but using it require to solve some other problems (like making additional cage). 3mm volunteered for taking this tedious path for our next boss character, so i hope this new pipeline will work for us!

VN:F [1.9.11_1134]
Rating: 7.9/10 (25 votes cast)
VN:F [1.9.11_1134]
Rating: -1 (from 11 votes)

The Curse of the Polygon

May 3rd, 2009

The number of polygons in a 3D model was a main concern for us when we started off the development for Garshasp. After a little experience and testing with our rendering engine, OGRE, it was apparent that the number of polygons wasn’t as critical as we thought and other factors such as rendering batch counts had more effects on draining the frame rate. In the second wave of our development, we relaxed the polygon counts a little bit and aimed at higher polygon art.

This trend continued until very recently when yzt, our Director of Tools and Technology, profiled the system using Intel VTune and surprisingly found out that currently one main bottleneck for system performance is the skinning calculations done for animating the 3D characters.

We are currently using software skinning which means all the calculations are happening on the CPU. Every added polygon for a model means added vertices and this would mean that a new computation has to be made on the frame to calculate its correct position using the character skeleton bone positions. The more polygons would mean more calculations per frame.

This bottleneck would be another reason to select lower polygon models but before doing so we tried out hardware skinning in which the calculations for vertex positions are handled in a vertex shader. Embracing the massive parallel architecture of the GPU, these calculations can be handled very efficiently and the initial tests by yzt so far proved to be successful. We need to delve into it a bit more and our next challenge would be to limit the number of bones in character skeletons to comply with vertex shader limitations.

We’ll re-enter the loop to attack the next performance eater after this issue is solved.

VN:F [1.9.11_1134]
Rating: 5.5/10 (4 votes cast)
VN:F [1.9.11_1134]
Rating: -1 (from 1 vote)
Author: Categories: Engine Tags: ,

World Creation Paradox

April 28th, 2009

For months we saw ourselves come to the same major question whenever we wanted to plan for the level creation tasks. The question was basically, “Where should we start from?”.

There are always at least three main poles which need proper attention in order for a well balanced output, these are 1- Visuals (Graphics) 2- Game Play (Fun factor) 3 – Technical Feasibility (coding). If you want to create a level which looks great, is really fun to play and can be created considering the technical constraints or project constraints, then the three elements seem like they form a closed loop and it would be hard to decied where exactly the process should start from. Should we worry about the visuals first? Should we see what we can achieve technically and constrain the rest or should we consider the fun gameplay elements and then add the other layers?

There seems to be no strict answer to the above issue and after lots of discussions, we came up with a specific methodology to perform which we are following currently. Seeing a presentation from this years GDC from the Bioware team which was used for Mass Effect 2 strengthened our selected methodology.

What we are doing currently is to come up with the general visual concepts first, prepare the first phase for the level map which is 2D and then a 3D spatial map using boxes, fill it with the first stage of 3D models, play test the crude (un-textured) level in the game and implement the necessary code features and then re-iterate the loop by polishing the concepts, the 3D elements and gameplay testing and level design tweaking.  We have planned for four iterations. So far it seems to be a good choice although we need more time to really be able to tune the methodology and level development pipeline.

This task is very critical since it requires very close collaboration between members of different departments. Its quite fun also.

VN:F [1.9.11_1134]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.11_1134]
Rating: 0 (from 2 votes)

Garshasp video game early concept arts-01

April 27th, 2009

"GARSHASP"

This a very first painting of Garshasp’s appearance to study how his muscles and anatomy look like.

VN:F [1.9.11_1134]
Rating: 8.7/10 (19 votes cast)
VN:F [1.9.11_1134]
Rating: +3 (from 7 votes)
Author: Categories: Concept Art, Uncategorized Tags:

Environment 3D contents

April 27th, 2009

As far as our phased Environment building is still in it’s second phase, i should point out one or two issues. here is the pros and con’s of this method in my point of view;

Pros:

-If you are able to guess what was in the designer’s mind when drawing those lines on paper or a 2d digital image!!!, you won’t get a headache making meshes for the first phase of 3D environment contents.

Cons:

-If you are not able to guess what was in the designer’s mind when drawing those lines on paper or a 2d digital image!!!, you will get a headache making meshes for the first phase of 3D environment contents.

this two main factors of the phase two, resonant with each other to the point of total agony. the main issue is that he should have some meshes to test the game play but he don’t know yet what exactly he wants. may be all these words have one big reason:

“don’t work late at nights, especially alone, with music turned off!!”

and a word for the game play designers testers: as far as making just a very rough 3d sketch of the Environment, I have seen the huge amount of time needed to make these enormous levels PLAYABLE , apart from the final 3d contents.

should we revise the plans???

VN:F [1.9.11_1134]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.11_1134]
Rating: 0 (from 0 votes)
Author: Categories: Environment Art, Uncategorized Tags:

Behavior Tool – Seamlessness

April 27th, 2009

We have felt the need for a specific tool in our Editor for so long now. Something to enable us to focus on a specific character and tweak its properties. Whether its behaviors, animaiton, physics or any other property. We never quite dedicated the time to implemente the features we needed. However some bursts of inspiration struck me recently when I went over the Havok Behavior Tool which is a very nice and neat product.

Our plan for this week is to implement a few features in our editor, Iranvij, which will enable us to view a character in an isolated window, run the different animations with a time tracker and simulate this character alone to view the different behaviors from different states. This will be a big addition to Iranvij. The state chart for every character can be edited visually thanks to the new data driven Hierarchical Finite State Machines used for the characters and the visual graph features in our editor.

Another major feature which is being added to the game engine is the Seamlessness feature which will enable huge game worlds to be loaded and unloaded on the fly without any load screens halting the game experience. The main parts have already been implemented in the game engine, Zorvan, and now the necessary tools have to be added to Iranvij. This is going to be a big addition to the engine capabilities.

VN:F [1.9.11_1134]
Rating: 5.5/10 (4 votes cast)
VN:F [1.9.11_1134]
Rating: 0 (from 0 votes)
Author: Categories: Editor, Engine Tags:

Render Pipeline Rewrite

April 26th, 2009

We’re in the process of redesigning and rewriting the CPU and GPU-side code for Garshasp’s rendering pipeline. What I’m thinking about (and what we more-or-less have already) is an HDR, per-pixel-lit rendering with (pretty slow) shadow maps.
The new pipeline that I’m thinking about is like this:

  1. Shadow Pass (once per shadow-casting light that affects the frustum):
    Render all shadow-casting geometry from the light view point and keep all the depth values in the render target.
    Hardware requirements: FP32 or (at least) FP16 texture support. I can pack the depth value in an A8R8G8B8 texture if this turns out to be a limiting requirement.
    Notes: OGRE handles this by default, and it probably does a better job than me (I never got the hang of all the different methods for PSM frustum calculations. :) )
  2. Depth pass:
    Do an initial rendering of all opaque geometry to initialize the Z-buffer and also write the view-space pixel depths to a floating-point texture (because we SM3.0-era PC-developers can’t read from the Z-buffer directly.)
    Hardware requirements: FP32 or (at least) FP16 texture support. However, I can pack the depth value in an A8R8G8B8 texture if this turns out to be a limiting requirement.
    Notes: I should investigate interactions of this with MSAA.
  3. Shadow Map Generation Pass:
    Render all shadow-receiving geometry and calculate whether each pixel is in shadow or not (and how much) using the information form the last two passes. The PCF and gang should be run here. If we decide to allow multiple shadow casting lights, we can do this calculation for four of them in this pass and write the result in different components of a single A8R8G8B8 render target. More will need MRT.
    Hardware Requirements: Nothing special.
    Notes: OGRE claims that it handles this by default. But it also provides an “Integrated Shadow” option which should give me more control (and more chances to mess things up!) I should think about whether I can integrate this pass with the last one. The only problems I see are the MRT prerequisite and the different render target bit-depth requirements (8 vs. 32.)
  4. Render Pass:
    Render the glorious scene including the translucent objects and objects that need special treatment (fog volumes, light volumes, water,) using the shadow map generated in the last pass and the depth values from the pass before that (for the special effects.)
    Hardware Requirements: Will probably need SM3.0 or SM2.x if we want to support more than one or two lights. This is preferred to turning this into multiple passes, because of the number of our triangles and the large number of animations (rather costly vertex programs.) Also, will require FP_ARGB_16 for the render target; I’m hoping that every SM2.x card supports this with ease.
    Notes:
  5. HDR Bloom:
    Very effective visually, although requires many, many passes (the number is also partly dependent on output resolution, because of the down-sampling.)
    Hardware Requirements: SM3.0 allows to dramatically reduce the number of required passes (from 10-15 to 4-5.) But I don’t think I have time to write two sets of shaders. Will have to see what our recommended requirements would be.
    Note: Other effects can be achieved while we are at it here (e.g. glow maps.)
  6. Tonemapping:
    In effect, this is integrated into the last pass, but it’s too important not to get its own pass!
    Hardware Requirements: Nothing special that I can see.
    Notes: I need to read more on this. The few methods I have tried give great results but only on certain situations.

The easiest way of putting all this together is with a many-pass compositor. Yet OGRE does not let me access the intermediate textures easily (but I’ve seen this in the OGRE compositor demo! How is that done?!) Maybe I will be forced to put them one after the other in code, or at least generate the compositors in code, which is beneficial any way (easier adaptation to hardware and user config, etc.)

More to come.

VN:F [1.9.11_1134]
Rating: 7.0/10 (3 votes cast)
VN:F [1.9.11_1134]
Rating: 0 (from 0 votes)
Author: Categories: Engine Tags: , ,