Over time, I have adopted several coding patterns for writing readable and debuggable multi-condition game code, which I systematically follow.

By multi-condition code, I’m talking about code where certain final logic is executed, after multiple condition tests have successfully passed; also, if a test fails, further tests will be skipped for efficiency, effectively achieving short-circuit evaluation.

**NOTE:** I only mean to use this post to share some coding patterns I find useful, and I do NOT intend to suggest nor claim that they are “the only correct ways” to write code. Also, please note that it’s not all black and white. I don’t advocate unconditionally adhering to these patterns; I sometimes mix them with opposite patterns if it makes more sense and makes code more readable.

**Shortcuts:**

Early Outs

Debuggable Conditions

Debug Draw Locality

Forcing All Debug Draws

There are two ends of spectrum regarding this subject, **early outs** versus **single point of return**. Both camps have pretty valid arguments, but I lean more towards the early-out camp. The early-out style is the foundation of all the patterns presented in this post.

As an example, consider the scenario where we need to test if a character is facing the right direction, if the weapon is ready, and if the path is clear, before finally executing an attack.

This is one way to write it:

FacingData facingData = PrepareFacingData(); if (TestFacing(facingData)) { WeaponData weaponData = PrepareWeaponData(); if (TestWeaponReady(weaponData)) { PathData pathData = PareparePathData(); if (TestPathClear(pathData)) { Attack(); } } }

The code block above is written in the so-called **single-point-of-return** style. The logic flow is straightforward and always ends at the bottom of the code block. If the code block is wrapped inside a function, the function will have one single point of return, i.e. at the bottom of the function:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (TestFacing(facingData)) { WeaponData weaponData = PrepareWeaponData(); if (TestWeaponReady(weaponData)) { PathData pathData = PareparePathData(); if (TestPathClear(pathData)) { Attack(); } } } // return here }

The same logic in **early-out** style would look something like this, returning right where the first test fails:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData)) return; WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData)) return; PathData pathData = PareparePathData(); if (!TestPathClear(pathData)) return; Attack(); }

I like this style better, because the intent to get out of the function right after the first failing test is very clear. Also, as the number of tests get large, it avoids excessive indentation, which many jokingly call “indent hadouken.”

PHP Streetfighter #php pic.twitter.com/5dQ5H1UrB2

— Paul Dragoonis (@dr4goonis) June 11, 2014

Early outs don’t just apply to functions. They apply to loops as well. For example, this is how I would iterate through characters and collect those who can attack:

for (Characer &c : characters) { FacingData facingData = ParepareFacingData(c); if (!TestFacing(facingData)) continue; WeaponData weaponData = PrepareWeaponData(c); if (!TestWeaponReady(weaponData)) continue; PathData pathData = PareparePathData(c); if (!TestPathClear(pathData)) continue; charactersWhoCanAttack.Add(c); }

One special case where early out is not quite possible without refactoring, is that if there’s more code that must always be executed after the condition tests.

In the single-point-of-return style, it may look like this:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (TestFacing(facingData)) { WeaponData weaponData = PrepareWeaponData(); if (TestWeaponReady(weaponData)) { PathData pathData = PareparePathData(); if (TestPathClear(pathData)) { Attack(); } } } // always execute PostAttackTry(); }

But using the early-out style, we’d have a problem:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData)) return; WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData)) return; PathData pathData = PareparePathData(); if (!TestPathClear(pathData)) return; Attack(); // Uh, oh. Not always executed! PostAttackTry(); }

If you’re comfortable with using forward-only `goto`

s to jump to the `PostAttackTry`

call and your team’s coding standards allow it, then you’re set. Otherwise, we need to keep looking for solutions.

What about calling `PostAttackTry`

wherever the function returns?

void TryAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData)) { PostAttackTry(); return; } WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData)) { PostAttackTry(); return; } PathData pathData = PareparePathData(); if (!TestPathClear(pathData)) { PostAttackTry(); return; } Attack(); PostAttackTry(); }

This is also not good. The moment someone adds a new return while forgetting to also add a call to `PostAttackTry`

, the logic breaks.

In this case, if the logic is trivial, I’d be okay with just using the single-point-of-return style. Otherwise, I would refactor the tests into a separate function, while maintaining the early-out style:

bool CanAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData)) return false; WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData)) return false; PathData pathData = PareparePathData(); if (!TestPathClear(pathData)) return false; return true; } void TryAttack() { if (CanAttack()) Attack(); PostAttackTry(); }

[EDIT]

I’ve received feedback through multiple channels proposing the use of destructors of some helper class to invoke the final logic, a la scope-based resource management (RAII).

For me, the acceptable instances that rely on destructors like this are scoped-based resource management, profiler instrumentation, and whatever logic that comes in the form of a tightly coupled “entry” and “exit” logic pairs, but not this.

In my acceptable cases, an exit logic is ensured to always accompany its corresponding entry logic, taking burden off programmers by preventing them from accidentally exiting a scope without executing the exit logic.

I think relying on destructors to execute arbitrary logic upon scope exit that is not coupled with an entry logic induces unnecessary complexity and risk of omittance. When reading the code, instead of making a mental note of “something is executed here, and the accompanying exiting logic will be executed upon scope exit”, the readers now have to remember that “nothing is done yet, and only when the scope is exited will something happen.” To me, the latter is a mental burden that is more likely to be forgetten or overlooked, because it’s not tied to any concrete logic execution right at the code location where the helper struct is instantiated.

But what about putting the instantiation code at the end of the scope? I think that’s no good either. Because not all readers read to the bottom of the scope. They might miss out when they reach an early out that they are looking for.

[/EDIT]

Continuing using the character attack example from above, let’s consider the scenario where each test function now returns a results struct that contains a success flag indicating whether the test has passed, as well as extra info gathered from the test that is useful for debugging purposes.

This is what the code might look like:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData).IsFacingValid()) return; WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData).IsWeaponReady()) return; PathData pathData = PareparePathData(); if (!TestPathClear(pathData).IsPathClear()) return; Attack(); }

It looks good and all, but there’s one specific issue that technically doesn’t lie in the code itself, but it affects the programmer’s experience when debugging this piece of code.

Embedding the test function’s return value within `if`

conditions like this, means that if we want to set a break point and peek inside the extra info in the results structs, we’d have to step into the individual tests, step out of the tests, and then look at the returned value.

Visual Studio supports this feature inside the autos window:

I find it slightly annoying to have to step in and then step out of test functions, just to inspect their results results structs. So, I normally assign the return values to local variables, and then perform `if`

checks on those variables instead.

This way, I can just step over the test function calls and inspect the results structs, without having to step in and out of the test functions:

void TryAttack() { FacingData facingData = PrepareFacingData(); FacingResults facingResults = TestFacing(facingData); if (!facingResults.IsFacingValid()) return; WeaponData weaponData = PrepareWeaponData(); WeaponResults weaponResults = TestWeaponReady(weaponData); if (!weaponResults.IsWeaponReady()) return; PathData pathData = PareparePathData(); PathResults pathResults = TestPathClear(pathData); if (!pathResults.IsPathClear()) return; Attack(); }

Even if we’ve stepped past some tests, extra info regarding the tests is still available in the locals window, which is a convenience I now cannot live without:

As a side note, if the test is just a trivial expression, I think embedding it in the `if`

condition is totally fine, and it can actually make the code cleaner.

**NOTE:** For simplicity’s sake, code mechanisms to effectively strip debug draws in release build is omitted in this post (e.g. `#ifdef`

s, macros, flags, etc.).

Sometimes we need to debug draw based on test results, to show why a test succeeded or failed. That’s what the results structs returned from test functions are for.

I like to keep debug draw code close to the related condition tests, if not inside the test function themselves. In my opinion, this makes the code cleaner and easier to read in independent chunks.

It’s perfectly fine if leaving certain debug draw logic inside the test functions themselves makes more sense. However, there are cases where drastically different debug draws are desired at different call sites of the test functions.

My experience tells me that it’s not possible to anticipate how others will use test results for their own debug draws, so I usually put trivial or common debug draw logic inside the test functions, and leave use-case specific debug draw logic to the client code calling the test functions. In the attack example, the `TryAttack`

function body is considered client code that uses the test functions.

I generally follow this pattern:

void TryAttack() { // facing FacingData facingData = PrepareFacingData(); FacingResults facingResults = TestFacing(facingData); if (facingResults.IsFacingValid()) { DebugDrawFacingSuccess(facingResults.GetSuccessInfo()); } else { DebugDrawFacingFailure(facingResults.GetFailureInfo()); return; } // weapon WeaponData weaponData = PrepareWeaponData(); WeaponResults weaponResults = TestWeaponReady(weaponData); if (weaponResults.IsWeaponReady()) { DebugDrawWeaponSuccess(weaponResults.GetSuccessInfo()); } else { DebugDrawWeaponFailure(weaponResults.GetFailureInfo()); return; } // path PathData pathData = PareparePathData(); PathResults pathResults = TestPathClear(pathData); if (pathResults.IsPathClear()) { DebugDrawPathSuccess(pathResults.GetSuccessInfo()); } else { DebugDrawPathFailure(pathResults.GetFailureInfo()); return; } // final logic Attack(); }

This pattern is, again, in the early-out style.

If we use the single-point-of-return style, the code above can turn into this:

void TryAttack() { FacingData facingData = PrepareFacingData(); FacingResults facingResults = TestFacing(facingData); if (facingResults.IsFacingValid()) { DebugDrawFacingSuccess(facingResults.GetSuccessInfo()); WeaponData weaponData = PrepareWeaponData(); WeaponResults weaponResults = TestWeaponReady(weaponData); if (weaponResults.IsWeaponReady()) { DebugDrawWeaponSuccess(weaponResults.GetSuccessInfo()); PathData pathData = PareparePathData(); PathResults pathResults = TestPathClear(pathData); if (pathResults.IsPathClear()) { DebugDrawPathSuccess(pathResults.GetSuccessInfo()); } else { DebugDrawPathFailure(pathResults.GetFailureInfo()); } } else { DebugDrawWeaponFailure(weaponResults.GetFailureInfo()); } } else { DebugDrawFacingFailure(facingResults.GetFailureInfo()); } Attack(); }

The call to `DebugDrawFacingFailure`

is all the way down inside the bottom `else`

block. This is bad in terms of code locality. When I see the call to `DebugDrawFacingFailure`

at the end, I’d have to trace all the way up to find its corresponding condition test.

There are single-point-of-return alternatives that can improve debug draw locality, but it’s still always going to be a challenge to make clean cuts to separate code into chunks that fully contain reference to individual tests. Later test chunks will always need to reference earlier test results.

Sometimes it’s preferable to force debug draws for all test results, even when early tests fail. In this case, we don’t care about the effect of short-circuit evaluation any more.

This is the pattern I follow that adds a flag to force all debug draws, which in turn could be toggled by a debug option:

void TryAttack(bool forceAllDebugDraws) { bool anyTestFailed = false; // facing const FacingResults facingResults = TestFacing(); if (facingResults.IsFacingValid()) { DebugDrawFacingSuccess(facingResults.GetSuccessInfo()); } else { DebugDrawFacingFailure(facingResults.GetFailureInfo()); anyTestFailed = true; if (!forceAllDebugDraws) return; } // weapon const WeaponResults weaponResults = TestWeaponReady(); if (weaponResults.IsWeaponReady()) { DebugDrawWeaponSuccess(weaponResults.GetSuccessInfo()); } else { DebugDrawWeaponFailure(weaponResults.GetFailureInfo()); anyTestFailed = true; if (!forceAllDebugDraws) return; } // path const PathResults pathResults = TestPathClear(); if (pathResults.IsPathClear()) { DebugDrawPathSuccess(pathResults.GetSuccessInfo()); } else { DebugDrawPathFailure(pathResults.GetFailureInfo()); anyTestFailed = true; if (!forceAllDebugDraws) return; } // we'd only get here if total debug draw is forced // don't perform attack if any test has failed if (anyTestFailed) return; // final logic Attack(); }

If the flag to force all debug draws is set to true, all condition tests as well as debug draws will be executed. But the final `Attack`

function call still wouldn’t be reached, because it’s guarded by a flag keeping track of whether any test has failed.

You might have already foreseen that if the number of tests grow large, we can end up having a lot of duplicate code and logic structure, respectively the use of `anyTestFailed`

& `forceAllDebugDraws`

inside the `else`

blocks, and `if`

statements branching into calling success & failure debug draws.

If you’re willing to make a sacrifice to prepare a single master data struct at the start, which is to be passed into all test functions declared with the same signature, plus a master results struct that holds all test info for debug draws, here’s one alternative pattern for your consideration:

// these are C++ function pointers // if you're using C#, think of them as delegates bool (* TestFunc) (const Data &data, Results &results); bool (* DebugDrawFunc) (const Results &results); struct TestSpec { TestFunc m_func; DebugDrawFunc m_debugDrawSuccess; DebugDrawFunc m_debugDrawFailure; }; void TryAttack(bool forceAllDebugDraws) { bool anyTestFailed = false; // define sets of test function and debug draw functions TestFuncSpec testFuncs[] = { { // facing TestFacing, DebugDrawFacingSuccess, DebugDrawFacingFailure }, { // weapon TestWeaponReady, DebugDrawWeaponSuccess, DebugDrawWeaponFailure }, { // path TestPathClear, DebugDrawPathSuccess, DebugDrawPathFailure } } // iterate through each test Data masterData = PrepareMasterData(); Results masterResults; bool anyTestFailed = false; for (TestFuncSpec &spec : testFuncs) { bool success = spec.m_func(masterData, masterResults); if (success) { spec.m_debugDrawSuccess(masterResults); } else { spec.m_debugrRawFailure(masterResults); anyTestFailed = true; if (!forceAllDebugDraws) return; } } if (anyTestFailed) return; Attack(); }

When a new test function and its success & failure debug draw functions are defined, simply add the function set to the `testFuncs`

array. There is only one shared code structure (the range-based `for`

loop) that runs the tests, selects the success or failure debug draw functions to call, and optionally performs early outs.

Finally, if the length of the `TryAttack`

function grows to a point where the purpose of the function is not trivially clear any more. Recall a refactored variation above where all the condition tests are extracted into a separate `CanAttack`

function:

void TryAttack() { if (CanAttack()) Attack(); PostAttackTry(); }

This seems like a good change no matter how the conditional tests are done, as it makes the intention of `TryAttack`

crystal clear to the reader. I’d do this if readability is compromised due to function length.

That’s it! I’ve shared the coding patterns I think help make code readable and debuggable.

Using the early-out style as foundation, I’ve shown how to write debuggable conditions, achieve debug draw locality, and optionally force all debug draws.

I don’t consider these patterns exciting nor groundbreaking, but I find them very useful, and I hope you do, too.

]]>It occurred to me that the entire time I’ve been working with quaternions, I have never read or learned about the derivation of the formula for slerp, spherical linear interpolation. I just learned the final formula and have been using it.

Upon a preliminary search I couldn’t seem to immediately find a straightforward derivation, either (at least not one that fits in the context of game development). So I thought it might be a fun exercise to derive it myself.

As it turns out, it is indeed fun and could probably serve as an interesting trigonometry & vector quiz question!

A quick recap: slerp is an operation that interpolates between two vectors along the shortest arc (in any dimension higher than 1D). It takes as input the two vectors to interpolate between plus an interpolation parameter:

where is the angle between the two vectors:

If the interpolation parameter changes at a constant rate, the angular velocity of the slerp result is also constant. If is set to , it means the slerp result is “the 25% waypoint on the arc from to : the angle between and the slerp result is , and the angle between and the slerp result is .

In the context of game development, slerp is typically used to interpolate between orientations represented by quaternions, which can be expressed as 4D vectors. In this case the shortest arc slerp interpolates across lies on a 4D hypersphere.

As mentioned before, this formula can be used on any vectors in any dimension higher than 1D. So it can also be used to interpolate between two 3D vectors along a sphere, or between two 2D vectors along a circle.

In the context of game development, we almost exclusively work with unit quaternions. So in my derivation, I make the assumption that the vectors we are working with are all unit vectors. The flow of the derivation should be pretty much the same even if the vectors are not unit vectors.

Without further ado, here’s the derivation.

Let be the results of slerp:

And let be the angle between and .

Knowing that the angle between and is , and the angle between and is , we can come up with this figure:

Here’s the strategy. We build a pair of orthogonal axes and from and . Then, we use the parametric circle formula to find :

Since is already a unit vector that convenient lies on the horizontal axis in the figure, let’s just pick . So then can be found by taking away the component in that is parallel to and normalizing the remainder:

Now plug and back into the parametric circle formula:

And voila! We have our slerp formula.

*Edit: Eric Lengyel has pointed out there’s another way to derive the slerp formula using similar triangles, presented in his Mathematics for 3D Game Programming and Computer Graphics, 3rd ed., Section 4.6.3.*

Source files are on GitHub.

Shortcut to sterp implementation.

Shortcut to code used to generate animations in this post.

Slerp, spherical linear interpolation, is an operation that interpolates from one orientation to another, using a rotational axis paired with the smallest angle possible.

Quick note: Jonathan Blow explains here how you should avoid using slerp, if normalized quaternion linear interpolation (nlerp) suffices. Long store short, nlerp is faster but does not maintain constant angular velocity, while slerp is slower but maintains constant angular velocity; use nlerp if you’re interpolating across small angles or you don’t care about constant angular velocity; use slerp if you’re interpolating across large angles and you care about constant angular velocity. But for the sake of using a more commonly known and used building block, the remaining post will only mention slerp. Replacing all following occurrences of slerp with nlerp would not change the validity of this post.

In general, slerp is considered superior over interpolating individual components of Euler angles, as the latter method usually yields orientational sways.

But, sometimes slerp might not be ideal. Look at the image below showing two different orientations of a rod. On the left is one orientation, and on the right is the resulting orientation of rotating around the axis shown as a cyan arrow, where the pivot is at one end of the rod.

If we slerp between the two orientations, this is what we get:

Mathematically, slerp takes the “shortest rotational path”. The quaternion representing the rod’s orientation travels along the shortest arc on a 4D hypersphere. But, given the rod’s elongated appearance, the rod’s moving end seems to be deviating from the shortest arc on a 3D sphere.

My intended effect here is for the rod’s moving end to travel along the shortest arc in 3D, like this:

The difference is more obvious if we compare them side-by-side:

This is where swing-twist decomposition comes in.

Swing-Twist decomposition is an operation that splits a rotation into two concatenated rotations, swing and twist. Given a twist axis, we would like to separate out the portion of a rotation that contributes to the twist around this axis, and what’s left behind is the remaining swing portion.

There are multiple ways to derive the formulas, but this particular one by Michaele Norel seems to be the most elegant and efficient, and it’s the only one I’ve come across that does not involve any use of trigonometry functions. I will first show the formulas now and then paraphrase his proof later:

Given a rotation represented by a quaternion and a twist axis , combine the scalar part from the projection of onto to form a new quaternion:

We want to decompose into a swing component and a twist component. Let the denote the swing component, so we can write . The swing component is then calculated by multiplying with the inverse (conjugate) of :

Beware that and are not yet normalized at this point. It’s a good idea to normalize them before use, as unit quaternions are just cuter.

Below is my code implementation of swing-twist decomposition. Note that it also takes care of the singularity that occurs when the rotation to be decomposed represents a 180-degree rotation.

public static void DecomposeSwingTwist ( Quaternion q, Vector3 twistAxis, out Quaternion swing, out Quaternion twist ) { Vector3 r = new Vector3(q.x, q.y, q.z); // singularity: rotation by 180 degree if (r.sqrMagnitude < MathUtil.Epsilon) { Vector3 rotatedTwistAxis = q * twistAxis; Vector3 swingAxis = Vector3.Cross(twistAxis, rotatedTwistAxis); if (swingAxis.sqrMagnitude > MathUtil.Epsilon) { float swingAngle = Vector3.Angle(twistAxis, rotatedTwistAxis); swing = Quaternion.AngleAxis(swingAngle, swingAxis); } else { // more singularity: // rotation axis parallel to twist axis swing = Quaternion.identity; // no swing } // always twist 180 degree on singularity twist = Quaternion.AngleAxis(180.0f, twistAxis); return; } // meat of swing-twist decomposition Vector3 p = Vector3.Project(r, twistAxis); twist = new Quaternion(p.x, p.y, p.z, q.w); twist = Normalize(twist); swing = q * Quaternion.Inverse(twist); }

Now that we have the means to decompose a rotation into swing and twist components, we need a way to use them to interpolate the rod’s orientation, replacing slerp.

Replacing slerp with the swing and twist components is actually pretty straightforward. Let the and denote the quaternions representing the rod’s two orientations we are interpolating between. Given the interpolation parameter , we use it to find “fractions” of swing and twist components and combine them together. Such fractiona can be obtained by performing slerp from the identity quaternion, , to the individual components.

So we replace:

with:

From the rod example, we choose the twist axis to align with the rod’s longest side. Let’s look at the effect of the individual components and as varies over time below, swing on left and twist on right:

And as we concatenate these two components together, we get a swing-twist interpolation that rotates the rod such that its moving end travels in the shortest arc in 3D. Again, here is a side-by-side comparison of slerp (left) and swing-twist interpolation (right):

I decided to name my swing-twist interpolation function **sterp**. I think it’s cool because it sounds like it belongs to the function family of **lerp** and **slerp**. Here’s to hoping that this name catches on.

And here’s my code implementation:

public static Quaternion Sterp ( Quaternion a, Quaternion b, Vector3 twistAxis, float t ) { Quaternion deltaRotation = b * Quaternion.Inverse(a); Quaternion swingFull; Quaternion twistFull; QuaternionUtil.DecomposeSwingTwist ( deltaRotation, twistAxis, out swingFull, out twistFull ); Quaternion swing = Quaternion.Slerp(Quaternion.identity, swingFull, t); Quaternion twist = Quaternion.Slerp(Quaternion.identity, twistFull, t); return twist * swing; }

Lastly, let’s look at the proof for the swing-twist decomposition formulas. All that needs to be proven is that the swing component does not contribute to any rotation around the twist axis, i.e. the rotational axis of is orthogonal to the twist axis.

Let denote the parallel component of to , which can be obtained by projecting onto :

Let denote the orthogonal component of to :

So the scalar-vector form of becomes:

Using the quaternion multiplication formula, here is the scalar-vector form of the swing quaternion:

Take notice of the vector part of the result:

This is a vector parallel to the rotational axis of . Both and are orthogonal to the twist axis , so we have shown that the rotational axis of is orthogonal to the twist axis. Hence, we have proven that the formulas for and are valid for swing-twist decomposition.

That’s all.

Given a twist axis, I have shown how to decompose a rotation into a swing component and a twist component.

Such decomposition can be used for swing-twist interpolation, an alternative to slerp that interpolates between two orientations, which can be useful if you’d like some point on a rotating object to travel along the shortest arc.

I like to call such interpolation **sterp**.

Sterp is merely an alternative to slerp, not a replacement. Also, slerp is definitely more efficient than sterp. Most of the time slerp should work just fine, but if you find unwanted orientational sway on an object’s moving end, you might want to give sterp a try.

An application of swing-twist decomposition in 2D just came to mind.

If the twist axis is chosen to be orthogonal to the screen, then we can utilize swing-twist decomposition to use the orientation of objects in 3D to drive the rotation of 2D elements in screen space or some other data. The twist component represents exactly the portion of 3D rotation projected onto screen space.

However, in terms of performance, we might be better off just projecting a 3D object’s local axis onto screen space and find the angle between it and a screen space axis. But then again, the swing-twist decomposition approach doesn’t have the singularity the projection approach has when the chosen local axis becomes orthogonal to the screen.

]]>This post is part of my Game Programming Series.

Complete source code for the debug draw utility and Unity scene for generating the demo animation above can be found on GitHub. Here is a shortcut to the debug draw utility class. And here is a shortcut to the shaders.

A couple weeks ago, I documented how I implemented a wireframe Unity debug draw utility using cached mesh pools and vertex shaders.

Recently, I have upgraded the utility to now support various shaded styles, including solid color, flat-shaded, and smooth-shaded. This post is a documentation of my development process and how I solved some of the challenges on the way.

For each mesh rendered in wireframe style, the original mesh factory only needed to generate an array of unique vertices, along with an index array containing the vertex indices in either lines or line strip topology.

To generate a mesh to be rendered in solid color style, I reused the same unique vertex arrays, but the index arrays hadto be changed to contain vertex indices in triangle topology, three indices per triangle.

Once the generation of meshes for solid color style was done, I decided counter-intuitively to first implement the “fancier” smooth-shaded style before the flat-shaded style, because the former was actually an easier incremental change from the solid color style. Taking spheres for example, the vertex array actually still didn’t need to be changed; I just had to create an array of normals that is the exact copy of the vertices. Recall from the previous post that in order to reduce numbers of cached meshes, I offloaded scaling to the vertex shaders and just generated meshes that are unit primitives. The normal of a vertex of a smooth-shaded unit sphere is just conveniently identical to the vertex positional vector.

Figuring out the index arrays for other smooth-shaded primitive meshes wasn’t as straightforward as spheres, but it wasn’t too hard either. I still didn’t need to change most of the vertex arrays and just had to figure out the proper accompanying normal array and index array. Cones were a notable exception, because even with smooth-shaded style, they still have some normal discontinuity along the base edges, which required duplicates of the base edge vertices with different normals.

Finally moving onto the flat-shaded style, most primitives required me to modify the generation of vertex arrays, normal arrays, and index arrays. Arrays of unique vertices no longer worked, because a vertex shared by multiple faces (triangles, quads, circles, etc.) would have a different normal on each face. For each face, a new set of vertices had to be put into the vertex array. Different primitives required slightly different techniques to generate the vertices for each face. Taking spheres for example again, for each longitudinal strip, two triangles connecting to the poles plus two triangles per quad along the strip were needed. The normals were simply computed with cross products of any two non-parallel vectors connecting vertices in each face.

I generally followed this pattern for triangles:

Vector3[] aVert = new Vector3[numVerts]; Vector3[] aNormal = new Vector3[numNormals]; int[] aIndex = new int[numIndices]; int iVert = 0; int iNormal = 0; int iIndex = 0; for (int i = 0; i < numIterations; ++i) { int iTriStart = iVert; aVert[iVert++] = ComputeTriVert0(i); aVert[iVert++] = ComputeTriVert1(i); aVert[iVert++] = ComputeTriVert2(i); Vector3 tri01 = aVert[iTriStart + 1] - aVert[iTriStart]; Vector3 tri02 = aVert[iTriStart + 2] - aVert[iTriStart]; Vector3 triNormal = Vector3.Cross(tri01, tri02).normalized; aNormal[iNormal++] = triNormal; aNormal[iNormal++] = triNormal; aNormal[iNormal++] = triNormal; aIndex[iIndex++] = iTriStart; aIndex[iIndex++] = iTriStart + 1; aIndex[iIndex++] = iTriStart + 2; }

And this pattern for quads:

Vector3[] aVert = new Vector3[numVerts]; Vector3[] aNormal = new Vector3[numNormals]; int[] aIndex = new int[numIndices]; int iVert = 0; int iNormal = 0; int iIndex = 0; for (int i = 0; i < numIterations; ++i) { int iQuadStart = iVert; aVert[iVert++] = ComputeQuadVert0(i); aVert[iVert++] = ComputeQuadVert1(i); aVert[iVert++] = ComputeQuadVert2(i); aVert[iVert++] = ComputeQuadVert3(i); Vector3 quad01 = aVert[iQuadStart + 1] - aVert[iQuadStart]; Vector3 quad02 = aVert[iQuadStart + 2] - aVert[iQuadStart]; Vector3 quadNormal = Vector3.Cross(quad01, quad02).normalized; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aIndex[iIndex++] = iQuadStart; aIndex[iIndex++] = iQuadStart + 1; aIndex[iIndex++] = iQuadStart + 2; aIndex[iIndex++] = iQuadStart; aIndex[iIndex++] = iQuadStart + 2; aIndex[iIndex++] = iQuadStart + 3; }

The positional portion of the vertex shader for all styles is actually identical, so I wanted to find a way to avoid creating an extra set of vertex and fragment shaders just in order to add the logic for normals. Then I found out about Unity’s shader variant feature. By using the `shader_feature`

keyword and `#ifdef`

‘s in the shaders, combined with the `Material.EnableKeyword`

method, I was able to choose from a collection of variants generated from a single master shader at run time for each primitive mesh type. I used the `NORMAL_ON`

keyword for the normal feature.

As shown below, only when the `NORMAL_ON`

keyword is enabled are normals included in the vertex structs.

#pragma shader_feature NORMAL_ON struct appdata { float4 vertex : POSITION; #ifdef NORMAL_ON float3 normal : NORMAL; #endif }; struct v2f { float4 vertex : SV_POSITION; #ifdef NORMAL_ON float3 normal : NORMAL; #endif };

The model-view matrix is used to transform vertex positions from object space into view space, but normals need to be transformed using the inverse transpose of the model-view matrix. Since the scaling is offloaded to the shader, I needed to fold in the scaling portion of the inverse transpose of the model-view matrix myself.

v2f vert (appdata v) { v2f o; // ... #ifdef NORMAL_ON float4x4 scaleInverseTranspose = float4x4 ( 1.0f / _Dimensions.x, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f / _Dimensions.y, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f / _Dimensions.z, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f ); float4x4 m = mul(UNITY_MATRIX_IT_MV, scaleInverseTranspose); o.normal = mul(m, float4(v.normal, 0.0f)).xyz; #endif return o; }

I also used the `shader_feature`

keyword to optionally activate the “cap shift/scaling” logic for cylinders and capsules. Recall from the previous post that in order not to generate a mesh for each possible height, only unit-height cylinder and capsule meshes are generated, and the caps are shifted towards the X-Z plane, scaled, and then shifted back to the final height. I used the `CAP_SHIFT_SCALE`

keyword for this feature.

#pragma shader_feature CAP_SHIFT_SCALE // (x, y, z) == (dimensionX, dimensionY, dimensionZ) // w == capShiftScale // shifts 0.5 towards X-Z plane, scale by dimensions, // and then shoft back 0.5 * capShiftScale) float4 _Dimensions; v2f vert (appdata v) { v2f o; #ifdef CAP_SHIFT_SCALE const float ySign = sign(v.vertex.y); v.vertex.y -= ySign * 0.5f; #endif v.vertex.xyz *= _Dimensions.xyz; #ifdef CAP_SHIFT_SCALE v.vertex.y += ySign * 0.5f * _Dimensions.w; #endif o.vertex = UnityObjectToClipPos(v.vertex); // ... return o; }

I noticed some Z-fighting between the two styles when I drew the same meshes twice, once in wireframe style and once in shaded style. It was actually an easy fix. I just added a small Z-bias to make sure the wireframe lines are always drawn in front of the shaded pixels.

float _ZBias; v2f vert (appdata v) { v2f o; // ... o.vertex = UnityObjectToClipPos(v.vertex); o.vertex.z += _ZBias; // ... return 0; }

And finally here’s the fragment shader. It really doesn’t contain anything out of the ordinary, except that it remaps the vertex brightness from (0.0, 1.0) to (0.3, 1.0), because I really don’t like completely black pixels.

fixed4 frag (v2f i) : SV_Target { fixed4 color = _Color; #ifdef NORMAL_ON i.normal = normalize(i.normal); color.rgb *= 0.7f * i.normal.z + 0.3f; // darkest at 0.3f #endif return color; }

That’s it! I am pretty satisfied with the current Unity debug draw utility. It’s also easy to combine primitives to make more interesting shapes, such as the arrows shown in the demo animation above.

Potentially, the meshes for flat-shaded and smooth-shaded styles, generated from the mesh factory, can be used to implement a gizmo utility. But I’ll probably only do it when I really need it.

Stay tuned for more documentation of my future venture into Unity land.

Until next time!

]]>This post is part of my Game Programming Series.

Complete source code for the debug draw utility and Unity scene for generating the demo animation above can be found on GitHub. Here is a shortcut to the debug draw utility class. And here is a shortcut to the shaders.

I’ve recently started picking up Unity, and quickly found out that the only easily accessible debug draw function is `Debug.DrawLine`

, unless I was mistaken (in which case please do let me know).

So I thought it was a good opportunity to familiarize myself with Unity’s environment and a great exercise to implement a debug draw utility that draws various primitives, including rectangles, boxes, spheres, cylinders, and capsules. This post is essentially a quick documentation of what I have done and problems I’ve encountered.

As my first iteration, I took the naive approach and just wrote functions that internally make a bunch of calls to `Debug.DrawLine`

. You can see such first attempt here in the history.

The majority of the time spent was pretty much figuring out the right math, so nothing special. I guess the only thing worth pointing out is how I arranged the loops in the functions for spheres and capsules. My first instinct was to draw “from top to bottom”, looping from one pole to the other and constructing rings of line segments along the way, with special cases at the poles handled outside the loop. However, I didn’t like the idea of part of the math outside the loop, as it didn’t feel elegant enough (note: this is just my personal preference). So I came up with a different way of doing it, where I “assemble identical longitudinal pieces” around the central axis that connects the poles. In this case, there are no special cases outside the loop body.

After my first attempt, I got curious as to how other people debug draw spheres in Unity, and I came across this gist. This is when it occurred to me that I can get better performance by caching the mathematical results into meshes, and then simply draw the cached meshes, as well as offloading some of the work onto the GPU with vertex shaders.

There are a bunch of primitives in my debug draw utility, so I won’t enumerate every single one of them. I’ll just use the capsule as an example.

I didn’t want to create a new mesh for every single combination of height, radius, latitudinal segments, and longitudinal segments, because you can have so many different combinations of floats that it’s impractical. Instead, I used just the latitudinal and longitudinal segments to generate a key for each cached mesh, and modify the vertices in the vertex shader with height and radius as shader input.

private static Dictionary<int, Mesh> s_meshPool; private static Material s_material; private static MaterialPropertyBlock s_matProperties; public static void DrawCapsule(...) { if (latSegments <= 0 || longSegments <= 1) return; if (s_meshPool == null) s_meshPool = new Dictionary<int, Mesh>(); int meshKey = (latSegments << 16 ^ longSegments); Mesh mesh; if (!s_meshPool.TryGetValue(meshKey, out mesh)) { mesh = new Mesh(); // ... s_meshPool.Add(meshKey, mesh); } if (s_material == null) { s_material = new Material(Shader.Find("CjLib/CapsuleWireframe")); } if (s_matProperties == null) s_matProperties = new MaterialPropertyBlock(); s_matProperties.SetColor("_Color", color); s_matProperties.SetVector("_Dimensions", new Vector4(height, radius)); Graphics.DrawMesh(mesh, center, rotation, s_material, 0, null, 0, s_matProperties); }

And below is the vertex shader. I basically shift each cap towards the center, scale the vertices using the radius, and push them back out using the height. I used the `sign`

function to effectively branch on which side of the XZ plane the vertices are on, without actually introducing a code branch in the shader.

float4 _Dimensions; // (height, radius, *, *) v2f vert (appdata v) { v2f o; float ySign = sign(v.vertex.y); v.vertex.y -= ySign * 0.5f; v.vertex.xyz *= _Dimensions.y; v.vertex.y += ySign * 0.5f * _Dimensions.x; o.vertex = UnityObjectToClipPos(v.vertex); return o; }

However, I spent 2 hours past midnight just scratching my head, trying to figure out why some of my debug draw meshes pop around as I shift and rotate the camera. It was as if the positional pops are dependent on the camera position and orientation, which was quite bizarre. It finally occurred to me that I might not have been consistently getting vertex positions in object space in the vertex shader, and based on that assumption I found this post that confirmed my suspicion.

Basically, Unity has draw call batching turned on by default, so it inconsistently passed in vertex positions to vertex shaders in either object space or world space. It’s actually stated in Unity’s documentation here under the not-so-obvious `DisableBatching`

tag section, that vertex shaders operating in object space won’t work reliably if draw call batching is on.

Although the process of figuring out what went wrong was annoying, the fix was luckily quite simple: just disable draw call batching in the vertex shaders.

Tags { "DisableBatching" = "true" }

That’s it! I hope you find this post interesting. I will likely continue to document my ventures into the Unity world.

]]>The source files for generating the animations in this post are on GitHub.

本文之中文翻譯在此 (by Wayne Chen)

Timeslicing is a very useful technique to improve the performance of batched algorithms (multiple instances of the same algorithm): instead of running all instances of algorithms in a single frame, spread them across multiple frames.

For instance, if you have 100 NPCs in a game level, you typically don’t need to have every one of them make a decision in every single frame; having 50 NPCs make decisions in each frame would effectively reduce the decision performance overhead by 50%, 25 NPCs by 75%, and 20 NPCs by 80%.

Note that I said timeslicing the decisions, __not__ the whole update logic of the NPCs. In every frame, we’d still want to animate every NPC, or at least the ones closer and more noticeable to the player, based on the **latest decision**. The extra animation layer can usually hide the slight latency in the timesliced decision layer.

Also bear in mind that I will not be discussing how to finish a single algorithm across multiple frames, which is another form of timeslicing that is not within the scope of this post. Rather, this post will focus on spreading multiple instances of the same algorithm across multiple frames, where each instance is small enough to fit in a single frame.

Such timeslicing technique applies to batched algorithms that are not hyper-sensitive to latency. If even a single frame of latency is critical to certain batched algorithms, it’s probably not a good idea to timeslice them.

In this post, I’d like to cover:

- An example that involves running multiple instances of a simple algorithm in batch.
- How to timeslice such batched algorithms.
- A categorization for timeslicing based on the timing of input and output.
- A sample implementation of a timeslicer utility class.
- Finally, how threads can be brought into the mix.

The example I’m going to use is a simple logic that orients NPCs to face a target. Each NPC’s decision layer computes the desired orientation to face the target, and the animation layer tries to rotate the NPCs to match their desired orientation, capped at a maximum angular speed.

First, let’s see an animated illustration of what it might look like if this algorithm is run for every NPC in every frame (Update All).

The moving circle is the target, the black pointers represent NPCs and their orientation, and the red indicators represent the NPCs’ desired orientation.

And the code looks something like this:

void NpcManager::UpdateFrame(float dt) { for (Npc &npc : m_npcs) { npc.UpdateDesiredOrientation(target); npc.Animate(dt); } } void Npc::UpdateDesiredOrientation(const Object &target) { m_desiredOrientation = LookAt(target); } void Npc::Animate(float dt) { Rotation delta = Diff(m_desiredOrientation, m_currentOrientation); delta = Limit(delta, m_maxAngularSpeed); m_currentOrientation = Apply(m_currentOrientation, delta); }

As mentioned above, you typically don’t need to update all the NPCs’ decisions in one frame. We can achieve rudimentary timeslicing like this:

void NpcManager::UpdateFrame(float dt) { const unsigned kMaxUpdates = 4; unsigned npcUpdate = 0; while (npcUpdated < m_numNpcs && npcUpdated < kMaxUpdates) { m_aNpc[m_iNpcWalker].UpdateDesiredOrientation(target); m_iNpcWalker = (m_iNpcWalker + 1) % m_numNpc; ++npcUpdated; } for (Npc &npc : m_npcs) { npc.Animate(dt); } }

This straightforward approach could be enough. However, sometimes you just need more control over the timing of input and output. Using the more involved timeslicing logic presented below, you can have a choice of different timing of input and output to suit specific needs.

Before going any further, let’s take a look at the terminology that will be used throughout this post.

- Completing a
**batch**means finishing running the decision logic once for each NPC. - A
**job**represents the work to run an instance of decision logic for an NPC. - The
**input**is the data required to run a job. - The
**output**is the results from a job after it’s finished

Now the timeslicing logic.

Here are the steps of one way to timeslice batched algorithms. It’s probably not the absolute best in terms of efficiency or memory usage, but I find it logically clear and easy to maintain (which also means it’s good for presentational purposes). So unless you absolutely need to micro-optimize, I wouldn’t worry about it too much.

- Start a new batch.
- Figure out all the jobs that need to be done. Associate each job with a unique
**key**that can be used to infer the required input for the job. - For each job, prepare an instance of job
**parameters**that is a collection of its key, input, and output. - Start and finish up to a max number of jobs per frame.
- Depending on the timing of output (more on this later),
**save**the**results**of a job, including the job’s output and its associated key, by pushing it to a**ring buffer**that represents the**history**of job results. The rest of the game logic to query latest results by key. - After all jobs are finished, the batch is finished. Rinse and repeat.

One advantage of looking up output by key is that different timesliced systems can work with each other just fine, even if they reference each other’s output. As far as a system is concerned, it’s looking up output from another system using a key, and the other system is reporting back the latest valid output available associated with the given key. Sort of like a mini database.

In our example, since each job is associated with an NPC, it seems fitting to use the NPCs as individual keys.

Next, here’s a categorization of timeslicing, based on the timing of reading input and saving output.

NOTE: The use of words “synchronous” and “asynchronous” here has nothing to do with multi-threading. The words are only used to distinguish the timing of operations. Everything presented before the “Threads” section later in this post is single-threaded.

**Asynchronous Input**: input is read by a job only when it’s started.**Synchronous Input**: input is read by all jobs when a new batch starts.**Asynchronous Output**: a job’s output is saved as soon as the job finishes.**Synchronous Output**: output of all jobs is saved when a batch finishes.

A ring buffer is used so that the rest of the game logic can be completely agnostic to the timing, and assume that the output (queried by key) is the latest.

Mixing and matching different timing of input and output gives 4 combinations. Async input / async output (AIAO), sync input / sync output (SISO), sync input / async output (SIAO), and async input / sync output (AISO). Let’s look at them one by one.

For demonstrational purposes, all animated illustrations below reflect a setup where only one job is started in each frame. The number should be set higher in a real game if it is introducing unacceptable latency.

For our specific example of NPCs turning to face the target, the AIAO combination probably makes the most sense. The input is read only when the job starts, so the job has the latest position of the target. The output is saved as soon as the job finishes with results of NPC’s desired orientation, so the NPC’s animation layer can react to the latest desired orientation immediately.

Here’s an animated illustration of what it could look like if we run the jobs at 10Hz (10 NPC jobs per second).

And here’s what it looks like if done at 30Hz.

You can see each that NPC waits until its job starts before getting the latest position of the target, and updates its desired orientation as soon as the job finishes.

For cases where asynchronous input from the AIAO combination as shown above is causing unwanted staggering, yet NPCs are still desired to react as soon as each of their job finishes, we can use the SIAO combination.

Here’s the 10Hz version.

And here’s the 30Hz version.

Note that when each job starts, it’s using the same target position as input, which has been synchronized at the start of each batch, while the output is saved for immediate NPC reaction as soon as each job finishes.

This is effectively the same as the first “basic first attempt” at timeslicing shown above.

The SISO combination is probably best explained by looking at the animated illustrations first. In order, below are the 10Hz and 30Hz versions of this combination.

It’s basically a “laggy” version of the very first animated illustration where every NPC is fully updated in every frame. All job input is synchronized upon batch start, and all output is saved out upon batch finish. Essentially this is kind of a “double buffer”, where the latest results aren’t reflected until all jobs in a batch are finished. For this reason, the history ring buffer must be **at least twice as large** as the max batch size for combinations with **synchronized output** to work properly.

The SISO combination is probably not ideal for our specific example. However, for cases like updating influence maps, heat maps, or any kind of game space analysis, the SISO combination could prove useful.

To be frank, I can’t think of a proper scenario to justify the use of the AISO combination. It’s only included here for comprehensive purposes. See the animated illustrations below in the order of the 10Hz version and 30Hz version. If you can think of a case where the AISO combination is a superior choice to the other three, please share your ideas in the comments or email me. I’d really like to know.

Now that we’ve seen all four combinations of timeslicing, it’s time to look at a sample implementation that does exactly what has been shown above.

Before going straight to the core timeslicing logic, let’s first look at how it plugs into the sample NPC code we saw earlier.

The timeslicer utility class allows users to provide a function that sets up keys for a new batch (writes to an array and returns new batch size), a function to set up input for job (writes to input based on key), and a function that is the logic to be timesliced (writes to output based on key and input).

class NpcManager { private: struct NpcJobInput { Point m_targetPos; }; struct NpcJobOutput { Orientation m_desiredOrientation; }; // timeslicing utility class Timeslicer < Npc*, // key NpcJobInput, // input NpcJobOutput, // output kMaxNpcs, // max batch size false, // sync input flag (false = async) false // sync output flag (false = async) > m_npcTimeslicer; // ...other stuff }; void NpcManager::Init() { // set up keys for new batch auto newBatchFunc = [this](Npc **aKey) unsigned { for (unsigned i = 0; i < m_numNpcs; ++i) { aKey[i] = GetNpc(i); } return m_numNpcs; }; // set up input for job auto setUpInputFunc = [this](Npc *pNpc, Input *pInput)->void { pInput->m_targetPos = GetTargetPosition(pNpc); } // logic to be timesliced auto jobFunc = [this](Npc *pNpc, const Input &input, Output *pOutput)->void { pOutput->m_desiredOrientation = LookAt(pNpc, input.m_targetPosition); }; // initialize timeslicer m_npcTimeslicer.Init ( newBatchFunc, setUpInputFunc, jobFunc ); } void NpcManager::UpdateFrame(float dt) { // timeslice decision logic m_timeslicer.Update(maxJobsPerFrame); // animate all NPCs based on latest decision results for (Npc &npc : m_npcs) { Output output; if (!m_timeSlicer.GetOutput(&npc, &output)) { npc.SetDesiredOrientation(output.m_desiredOrientation); } npc.Animate(dt); } }

And below is the timeslicer utility class in its entirety.

template < typename Input, typename Output, typename Key, unsigned kMaxBatchSize, bool kSyncInput, bool kSyncOutput > class Timeslicer { private: struct JobParams { Key m_key; Input m_input; Output m_output; }; struct JobResults { Key m_key; Output m_output; }; // number of jobs in current batch unsigned m_batchSize; // keep track of jobs in current frame unsigned m_iJobBegin; unsigned m_iJobEnd; // required to start jobs JobParams m_aJobParams[kMaxBatchSize]; // keep track of job results (statically allocated) static const unsigned kMaxHistorySize = kSyncOutput ? 2 * kMaxBatchSize // more on this later : kMaxBatchSize; typedef RingBuffer<JobResults, kMaxHistorySize> History; History m_history; // set up keys for new batch // (number of keys = batch size = jobs per batch) typedef std::function<unsigned (Key *)> NewBatchFunc; NewBatchFunc m_newBatchFunc; // set up input for job typedef std::function<void (Key, Input *)> SetUpInputFunc; SetUpInputFunc m_setUpInputFunc; // logic to be timesliced // (takes key and input, writes output) typedef std::function<void (Key, const Input &, Output *)> JobFunc; JobFunc m_jobFunc; public: void Init ( NewBatchFunc newBatchFunc, SetUpInputFunc setUpInputFunc, JobFunc jobFunc ) { m_newBatchFunc = newBatchFunc; m_setUpInputFunc = setUpInputFunc; m_jobFunc = jobFunc; Reset(); } void Reset() { m_batchSize = 0; m_iJobBegin = 0; m_iJobEnd = 0; } bool GetOutput(Key key, Output *pOutput) const { // iterate from newest history (last queued output) for (const JobResults &results : m_history.Reverse()) { if (key == results.m_key) { *pOutput = results.m_output; return true; } } return false; } void Update(unsigned maxJobsPerUpdate) { TryStartNewBatch(); StartJobs(maxJobsPerUpdate); FinishJobs(); } private: void TryStartNewBatch() { if (m_iJobBegin == m_batchSize) { // synchronous output saved on batch finish if (kSyncOutput) { for (unsigned i = 0; i < m_batchSize; ++i) { const JobParams ¶ms = m_aJobParams[i]; SaveResults(params); } } Reset(); Key aKey[kMaxBatchSize]; m_batchSize = m_newBatchFunc(aKey); for (unsigned i = 0; i < m_batchSize; ++i) { JobParams ¶ms = m_aJobParams[i]; params.m_key = aKey[i]; // synchronous input set up on new batch start if (kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } } } } void StartJobs(unsigned maxJobsPerUpdate) { unsigned numJobsStarted = 0; while (m_iJobEnd < m_batchSize && numJobsStarted < maxJobsPerUpdate) { JobParams ¶ms = m_aJobParams[m_iJobEnd]; // asynchronous input set up on job start if (!kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } m_jobFunc ( params.m_key, params.m_input, ¶ms.m_output ); ++m_iJobEnd; ++numJobsStarted; } } void FinishJobs() { while (m_iJobBegin < m_iJobEnd) { const JobParams ¶ms = m_aJobParams[m_iJobBegin++]; // asynchronous output saved on job finish if (!kSyncOutput) { SaveResults(params); } } } void SaveResults(const JobParams ¶ms) { JobResults results; results.m_key = params.m_key; results.m_output = params.m_output; if (m_history.IsFull()) { m_history.Dequeue(); } m_history.Enqueue(results); } };

If your game engine allows multi-threading, we can go one step further by offloading jobs to threads. Starting a job now creates a thread that runs the timesliced logic, and finishing a job now waits for the thread to finish. We need to use read/write locks to make sure the timeslicer plays nicely with the rest of game logic. Required changes to code are highlighted below.

class Timeslicer { // ...unchanged code omitted RwLock m_lock; struct JobParams { std::thread m_thread; Key m_key; Input m_input; Output m_output; }; bool GetOutput(Key key, Output *pOutput) const { ReadAutoLock readLock(m_lock); // iterate from newest history (last queued output) for (const JobResults &results : m_history.Reverse()) { if (key == results.m_key) { *pOutput = results.m_output; return true; } } return false; } void TryStartNewBatch() { WriteAutoLock writeLock(m_lock); if (m_iJobBegin == m_batchSize) { // synchronous output saved on batch finish if (kSyncOutput) { for (unsigned i = 0; i < m_batchSize; ++i) { const JobParams ¶ms = m_aJobParams[i]; SaveResults(params); } } Reset(); Key aKey[kMaxBatchSize]; m_batchSize = m_newBatchFunc(aKey); for (unsigned i = 0; i < m_batchSize; ++i) { JobParams ¶ms = m_aJobParams[i]; params.m_key = aKey[i]; // synchronous input set up on new batch start if (kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } } } } void StartJobs(unsigned maxJobsPerUpdate) { WriteAutoLock writeLock(m_lock); unsigned numJobsStarted = 0; while (m_iJobEnd < m_batchSize && numJobsStarted < maxJobsPerUpdate) { JobParams ¶ms = m_aJobParams[m_iJobEnd]; // asynchronous input set up on job start if (!kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } params.m_thread = std::thread([¶ms]()->void { m_jobFunc ( params.m_key, params.m_input, ¶ms.m_output ); }); ++m_iJobEnd; ++numJobsStarted; } } void FinishJobs() { WriteAutoLock writeLock(m_lock); while (m_iJobBegin < m_iJobEnd) { JobParams ¶ms = m_aJobParams[m_iJobBegin++]; params.m_thread.join(); // asynchronous output saved on job finish if (!kSyncOutput) { SaveResults(params); } } } };

If your game can afford to have one more frame of latency and you don’t want the timeslicer squatting a thread, you can tweak the update function a bit, where jobs are started at the end of update in the current frame, and are finished at the beginning of update in the next frame.

void TimeSlicer::Update(unsigned maxJobsPerUpdate) { FinishJobs(); TryStartNewBatch(); StartJobs(maxJobsPerUpdate); }

That’s it! We’ve seen how timeslicing batched algorithms can help with game performance, as well as the 4 combinations of input and output with different timing, each having its own use (well, maybe not the last one). We’ve also seen how the timeslicing logic can be further adapted to make use of threads.

I hope you find this useful.

]]>Also, this post is part 2 of a series (part 1) leading up to a geometric interpretation of Fourier transform and spherical harmonics.

Drawing analogy from vector projection, we have seen what it means to “project” a curve onto another in the previous post. This time, we’ll see how to find a the closest vector on a plane via vector projection, and then we’ll see how it translates to finding the best approximation of a curve via curve “projection”. This handy analogy can help us take another step closer to a geometric interpretation of Fourier transform and spherical harmonics later.

Given vectors , , and , the closest vector on the plane formed (or “spanned” in linear algebra jargon) by and is the projection of onto the plane. This projection, denoted , is a combination of scaled and , in the form of , that has the least error from .

The error is measured by the magnitude of the difference vector:

As pointed out in the previous post, minimizing this error is essentially equivalent to minimizing the root mean square error (RMSE):

This is what the relationship of , , , and looks like visually:

The projection of onto the plane spanned by and , is the vector on the plan that has the least error from , and the difference vector is orthogonal to the plane.

So how do we compute ? In the previous post we’ve seen how to project a vector onto another, so would computing be as simple as projecting onto , and then project the result again onto ? Not really. Here’s why:

As you can see in the figure above, isn’t parallel to nor . Projecting onto would give you a vector that is parallel to , and a subsequent projection onto would leave you with a result that is parallel to , which is definitely not .

One way to do it is to calculate a vector orthogonal to the plane, i.e. a plane normal , by taking the cross product of the two vectors that span the plane: . Then, take out the part in that is parallel to by subtracting the projection of onto from . What is left of is the part of that is parallel to the plane, i.e. the projection:

But, I want to talk about another way of performing the projection, which is easier to translate to curves later. and are not necessarily orthogonal to each other. Let’s find two orthogonal vectors that lie on the plane spanned by and . Then, we split into two parts, one parallel to one vector and one parallel to the other vector. Finally, we combine these two parts together to obtain a vector that is essentially the part of that is parallel to the plane.

As a simple illustration, if the plane is the X-Z plane, then the obvious two orthogonal vectors of choice would be and . To project a vector onto the X-Z plane, we split it into a part that is parallel to , which is , and a part that is parallel to , which is . Combining those two parts together would give us . This makes sense, because projecting a vector onto the X-Z plane is just as simple as dropping the Y component.

Now, given two arbitrary vectors and that span a plane, we can generate two orthogonal vectors, denoted and , by using a method called the Gram-Schmidt process. The first vector would simply be the . To compute the second vector , we take away from its part that is parallel to ; what’s left of is orthogonal to :

To compute , we combine the parts of that are parallel to and , respectively:

The Gram-Schmidt process is actually more general than described above. It can apply to higher dimensions. Given vectors, denoted to , in an -dimensional space (), and if the vectors are linearly independent, i.e. they span an -dimensional subspace, then we can generate vectors that are orthogonal to each other, denoted through , spanning the same subspace, using the Gram-Schmidt process.

The first vector would simply be . To compute the second vector , we take away from its part that is parallel to . To compute the third vector , we take away from its part that is parallel to **all previously generated orthogonal vectors**, and . Repeat this process until we have reached and produced :

Projecting an -dimensional vector onto this -dimensional subspace would involve combining the parts of the vector parallel to each of the orthogonal vectors. In our example above that involves 3D vectors, and . In higher dimensions, no simple 3D cross products can save you there.

Now we are done with vectors. Let’s take a look at curves!

Let’s say our interval of interest is . Given a 3rd-order polynomial curve , what’s the best approximation using a 2st-order polynomial curve, or a 1th-order polynomial curve (flat line)? How about simply dropping the higher-order terms, so we get and ? Here’s what they look like:

At first glance, I’d say and are not what we want. We can definitely find a parabolic curve and a line that approximate better. Look at just how far apart and are from at . Clearly, and are not the 2nd-order and 1st-order polynomial curves that have the least RMSEs from . Simply dropping higher-order terms turns out to be a naive approach. The right way to do it is just like what we did with vectors: projection.

In the vector example above, we were operating in the 3D geometric space. Now we are working with a more abstract 3rd-order polynomial space where lives in. The lower-order polynomial curve that has the least RMSE from is the projection of into that lower-order polynomial space. Let’s start with finding the 2nd-order polynomial curve that has the least RMSE from .

The 2nd-order polynomial subspace is 3-dimensional, since a 2nd-order polynomial curve has the form . Let’s first find 3 curves that span the subspace. An easy pick would be , , and . Now we need to use them to generate a set of orthogonal curves, , , and using the Gram-Schmidt process:

If you forgot how to “project” a curve onto another, please refer to the previous post.

Here are the results:

You can say that , , and are a set of orthogonal axes spanning the 2nd-order polynomial subspace. Now we split into three orthogonal parts by projecting it onto , , and :

Here’s what , , and look like alongside :

and might not look like they are close to , but they are the closest curves you can get along the axes and that have the least RMSEs from .

Now, we can combine the three orthogonal parts of to form the 2nd-order polynomial curve that is the best approximation of :

This looks way better than the result of simply dropping the 3rd-order term, as shown in the figure above.

Since the three parts are already orthogonal, we can actually obtain the 1st-order polynomial curve that best approximates by simply dropping from :

Also looking good, compared to simply dropping the 3rd-order and 2nd-order terms.

That’s it. In this post, we’ve seen how to generate a set of orthogonal curves from a set of curves spanning a lower-dimensional subspace of curves, and use the orthogonal curves to find the best approximation of a curve via curve “projection”.

We now have all the tools we need to move onto Fourier transform and spherical harmonics in the next post. Finally, something game-related!

]]>Also, this post is part 1 of a series (part 2) leading up to a geometric interpretation of Fourier transform and spherical harmonics.

Fourier transform and spherical harmonics are mathematical tools that can be used to represent a function as a combination of periodic functions (functions that repeat themselves, like sine waves) of different frequencies. You can approximate a complex function by using a limited number of periodic functions at certain frequencies. Fourier transform is often used in audio processing to post-process signals as combination of sine waves of different frequencies, instead of single streams of sound waves. Spherical harmonics can be used to approximate baked ambient lights in game levels.

We’ll revisit these tools in later posts, so it’s okay if you’re still not clear how they can be of use at this point. First, let’s start somewhere more basic.

If you have two vectors and , **projecting onto ** means stripping out part of that is orthogonal to , so the result is the part of that is parallel to . Here is a figure I borrowed from Wikipedia:

The **dot product** of and is a scalar equal to the product of magnitude of both vectors and the cosine of the angle (, using the figure above) between the vectors:

Another way of calculating the dot product is adding together the component-wise products. If and , then:

A follow-up to the alternate formula above is the formula for vector magnitude. The magnitude of a vector is the square root of the dot product of the vector with itself:

The geometric meaning of the dot product is the magnitude of the projection of onto **scaled** by . Dot product is commutative, i.e. , which means that is also equal to the magnitude of the projection of onto scaled by .

So if you want to get the magnitude of the projection of onto , you need to divide the dot product by the magnitude of :

To get the actual projected vector of onto , multiply the magnitude with the unit vector in the direction of , denoted by :

One important property of dot product is: if it’s positive, the two vectors point in roughly the same direction (); if it’s zero, the vectors are orthogonal (); if it’s negative, the vectors point away from each other ().

For the dot product of two unit vectors, like , it’s just . If it’s 1, then the two vectors point in exactly the same direction (); if it’s -1, then the two vectors point in exactly opposite directions (). So, in order to measure how close the directions two vectors point in, we can normalize both vectors and take their dot product.

Let’s say we have three vectors: , , and . If we want to determine which of and points in a direction closer to where points, we can just compare the dot products of their normalized versions, and . Whichever vector’s normalized version has a larger dot product with points in a direction closer to that of .

A metric often used to measure the difference between two data objects is the root mean square error (RMSE) which is the square root of the average of component-wise errors. For vectors, that means:

It kind of makes sense, because it is exactly the magnitude of the vector that is the difference between and scaled by :

It’s also the square root of the dot product of the difference vector with it self scaled by :

Here’s an important property of projection:

The projection of a vector onto another vector is the vector parallel to that has the **minimal RMSE** with respect to . In other words, gives you **the best scaled version of to approximate **.

Also note that if is larger than , it means has a smaller RMSE than with respect to ; thus, points in a direction closer to that o than does.

Now we’re finished with vectors. It’s time to move onto curves.

Let’s consider these three curves:

When working with curves, as opposed to vectors, we need to additionally specify an interval of of interest. For simplicity, we will consider for the rest of this post.

Below is a figure showing what they look like side-by-side within our interval of interest:

Just like vectors, “projecting” a curve onto another curve gives you the best scaled version of to approximate , and the “projection” has minimal RMSE with respect to . To compute the RMSE of curves, we need to first figure out how to compute the “dot product” of two curves.

Recall that the dot product of vectors is equal to the sum of component-wise products:

Mirroring that, let’s sum up the products of samples of curves at regular intervals, and we normalize the sum by dividing it with the number of samples, so we don’t get drastically different results due to different number of samples. If we take 10 samples between to compute the dot product of and , we get:

The more samples we use, the more accuracy we get. What if we take an infinite number of samples so we get the most accurate result possible?

This basically turns into an integral:

So there we have it, one common definition of the “dot product” of two curves:

**The integral of the product of two curves over the interval of interest**.

Copying the formula from vectors, the RMSE between two curves and is:

In integral form, it becomes:

The “mean” part of the error is omitted since it’s a division by 1, the length of our interval of interest.

To find out which one of the “normalized” version of and has less RMSE with respect to the normalized version of , we take the dot products of the normalized versions of the curves:

The dot product of and is larger than that of and . That means has a lower RMSE than with respect to .

Drawing analogy from vectors, is conceptually “pointing in a direction” closer to that of than does.

Now let’s try finding the best scaled version of that has minimal RMSE with respect to , by computing the projection of onto :

And this is what , , and look like side-by-side:

The projected curve is a scaled-up version of that is a better approximation of than itself; it is the best scaled version of that has the least RMSE with respect to .

Now that you know how to “project” a curve onto another, we will see how to approximate a curve with multiple simpler curves while maintaining minimal error.

]]>本文屬於My Career系列文

Here is the original English post.

本文之英文原文在此

註：為方便複製至PTT，本文排版採BBS格式，不習慣者請見諒

Uncharted 4已經發售，終於可以分享我負責開發的部分了

我主要是負責單人模式的夥伴AI、多人模式的戰友AI、還有一些遊戲邏輯

沒有收錄到最終遊戲的部分和一些瑣碎的細工我就略過不提

** = 崗位系統 = **

在本文開始前，我想要先談談我們用來指派NPC移動位置的崗位系統

這個系統的核心邏輯不是我負責的，我寫的是使用這個系統的客戶端程式

崗位是可行走空間中的離散位置

大部分是用工具自動生成的，也有一些是設計師手動擺置的

基於不同需求，我們設計不同的崗位平分系統

(e.g. 潛行崗位、戰鬥崗位)

然後我們選擇評分最高的崗位，指派NPC移動過去

** = 夥伴跟隨 = **

夥伴跟隨系統是繼承自The Last of Us

基本概念就是，夥伴在玩家周圍找個跟隨點

這些可能的跟隨點從玩家位置扇狀分開

並且要滿足以下的路徑線段淨空條件:

– 玩家到跟隨點

– 跟隨點到前方投射點

– 前方投設點到玩家

攀爬是Uncharted 4的新功能，這是The Last of Us 沒有的

為了與現有的跟隨系統整合，我利用攀爬崗位讓夥伴可以跟著玩家一起攀爬

這個功能比我想像中的還要難搞

單純根據玩家的攀爬狀態來切換夥伴的攀爬狀態，結果不甚理想

只要玩家快速在攀爬與非攀爬的狀態之間切換，夥伴就會在兩個狀態間快速跳換

於是我加入了遲滯現象(hysteresis)

只有在玩家切換了攀爬狀態，並且保持此狀態移動一定距離之後，夥伴才跟進

廣泛來說，遲滯現象是個解決行為跳換的好方法

** = 夥伴帶領 = **

遊戲中的某些特定場景，我們要讓夥伴帶領玩家前進

我把The Last of Us的帶領系統移植過來

設計師使用spline曲線在關卡中標記他們想讓夥伴帶領玩家的大致路線

如果有多個帶領路線，設計師則會用腳本語言切換主要的帶領路線

玩家的位置投射到spline曲線上，再往前延伸設定為帶領參考點

當帶領參考點超越被標記為等待點的spline曲線控制點，夥伴會前往下個等待點

如果玩家走回頭路

夥伴只有在帶領參考點離此次推進至最遠的等待點一段距離，才會回頭

這也是利用遲滯現象來避免行為跳換

我也把動態移動速度調整的功能整合進帶領系統

根據夥伴和玩家之間的距離，一些”速度平面”沿著spline曲線放置

夥伴有三種移動模式: 走路、跑步、衝刺

根據玩家撞到的速度平面，夥伴會選擇不同的移動模式

另外，夥伴的行進動畫速度也會基於玩家距離做微調

目的是避免切換移動模式的時後，有太突然的移動速度變化

** = 夥伴掩體共用 = **

在The Last of Us中，玩家和夥伴可以在各不離開掩體的狀況下重疊

我們稱這個為掩體共用

The Last of Us中的Joel伸手跨過Ellie和Tess按在掩體上

看起來很自然，因為夥伴的身型都比玩家嬌小

但是同樣的動作就不適合身型差不多的Nate、Sam、Sully、和Elena

而且Uncharted 4的遊戲節奏較快

讓Nate伸手去按掩體只會讓動作流暢性打折扣

所以我們決定就單純讓夥伴靠緊掩體，玩家稍微繞彎避開伙伴

我用的邏輯很簡單

如果玩家位置往移動方向投射的點，落在夥伴掩體周圍的一個方框內

夥伴就會取消目前的掩體行為，並且快速靠緊掩體

** = 救星戰友 = **

我負責多人模式的戰友(sidekicks)，而救星戰友是其中最特別的

單人模式中的NPC，沒有一個人的行為跟救星戰友一樣

他們會復甦被擊倒的同伴，也會複製玩家的掩蔽行為

救星戰友會嘗試複製玩家的掩蔽行為，並且盡量待在離玩家很近的地方

所以當玩家被擊倒的時候，他們就可以迅速跑過來復甦

如果玩家有裝備救星戰友的復甦包額外功能

他們會在採取復甦行動之前，朝被擊倒的復甦目標丟復甦包

復甦包丟擲基本上就是延用手榴彈的拋物線淨空測試和擲彈動作

只是我把手榴彈換成復甦包而已

** = 隱蔽草叢 = **

在隱蔽草叢中蹲行也是Uncharted 4才有的新功能

要實作這個功能，我們需要某種能夠標記場景的手段

遊戲邏輯才可以判斷玩家是否身處隱蔽草叢中

我們一開始是讓美術人員在Maya中標記背景模型的表面

但美術人員和設計師之間的溝通時間太長，很難頻繁改進關卡

於是我們決定用另外一種方法標記隱蔽草叢

我在場景編輯器中的nav mesh增加了隱密草叢的額外tag

讓設計師可以直接在編輯器中精準標記隱蔽草叢

有了這個額外的標記

我們也可以用這個資訊來為隱蔽崗位評分

** = 感知 = **

Uncharted 4沒有像The Last of Us有聆聽模式

所以我們必須要找另外一種方法，讓玩家有辦法得知附近的敵人威脅

好讓玩家不會在未知的敵對環境中產生迷失感

我利用敵人的感知資料，加入了威脅標示

當敵人開始注意(白色)、起疑(黃色)、和發現(橘色)玩家

這些標示會適時地提醒玩家

另外，我在威脅標示開始累積的同時播放背景雜音，以製造張力

當玩家被發現的時候，則播放大聲的提示音效

這些音效的安排和做用跟The Last of Us類似

** = 調查 = **

這是在我們送廠壓片前，我負責的最後一個功能

我平常在Naughty Dog是不參加正式會議的

不過在送廠壓片的前幾個月，我們每週至少開一次會

由Bruce Straley或Neil Druckmann主持，專注在遊戲的AI部分

幾乎每次開完會之後，調查系統都有需要更動的地方

前前後後總共經歷了好幾次大改

會讓敵人起疑的因素有兩種: 玩家和屍體

當敵人起疑了(起疑者)，他會抓最近的同伴來一起調查

離起疑點較近的人會成為調查者，另外一個人則是看守者

起疑者可能會視調查者，也有可能是看守者

我們總共有兩組不同的對話，適用於兩種不同的情況

(“那邊有異狀，我去看看” vs “那邊有異狀，你去看看”)

為了讓雙人調查看起來更自然

我使用了時域錯位的技巧，讓兩人的行動和威脅標示時間點錯開

否則兩個人的行為完全同步，看起來非常機械式、很不自然

如果調查者發現了屍體，他會通知全部的同伴開始搜索玩家

屍體也會被暫時標示，以讓玩家知道敵人為什麼進入警戒

在某些難度下，短時間內連續觸發調查，會讓敵人的感應力變敏銳

他們會更容易發現玩家，即使玩家躲在隱蔽草叢中也一樣

慘烈模式下，敵人永遠處於敏銳狀態

** = 對話動作 = **

這也是我負責的最後幾個功能之一

對話動作系統負責操控角色，在對話的時候做出一些小動作

像是轉頭看其他人和肢體動作

之前在The Last of Us

開發人員花好幾個月的時間，把整個遊戲所有的對話腳本手動加註上對話動作

我們可不想再做一次這種苦工

在這個開發階段，已經有部分對話腳本被手動加註好對話動作了

我們需要一個泛用型系統，可以幫沒有加註對話動作的腳本自動產生對話動作

而我就是負責製作這個對話動作系統

動畫師可以調整參數，改變轉頭速度、轉頭角度、注視時間、反覆時間等

** = 維持吉普車動量 = **

開發初期遇到的問題之一，就是馬達加斯加的吉普車駕駛關卡

當玩家開車撞到牆或者敵人的載具，玩家的車就會旋轉失速以致脫離車隊而關卡失敗

我使用的解決方法是，當玩家的車撞到牆或者敵方載具的時候

短暫地限制吉普車的最高角速度和線性速度的方向變量

這個簡單的方法相當有效，從此玩家就比較不容易旋轉失速而導致關卡失敗了

** = 載具死亡 = **

可駕駛的載具是首次在Uncharted 4登場

在這之前，所有的載具都是NPC駕駛、沿著固定軌道行進

我負責載具死亡的部分

摧毀載具有幾種方式:

解決駕駛、開槍射車、開車撞飛敵方機車、開車撞敵方吉普車導致旋轉失速

基於不同的死法，載具死亡系統會選擇載具和乘客的死亡動畫來播放

死亡動畫會漸漸混入物理引擎控制的ragdoll系統

所以死亡動畫會不著痕跡地轉換成物理模擬的翻車

當玩家開吉普車撞飛敵方機車的時候

我使用機車在XZ平面上投影的bounding box和碰撞點

來判斷要使用四個撞飛死亡動畫中的哪一個

至於衝撞使得敵方吉普車旋轉失速

我是拿敵方吉普車與預設行進方向之間的旋轉量差來比較旋轉失速判定閾值

載具播放死亡動畫的時候，有機會穿透牆壁

我使用球體投射，從預設位置投射向載具實際位置

如果投射結果是與牆壁碰撞，則把載具稍微往牆壁的法向量移動

不一次完全修正誤差，是為了避免太過劇烈的位移

我另外實作了一種特別的載具死亡類型，叫做載具死亡提示

這些死亡提示是動畫師和設計師在場景中擺置好的客製化死亡動畫

每個死亡提示在載具行進軌道上都有個進入範圍

當一個載具在死亡提示進入範圍中死亡，則會開始播放死亡提示的特殊死亡動畫

之所以開發這功能，一開始是為了2015年E3展的超帥氣吉普車死亡動畫

** = 混色用的貝爾矩陣 = **

我們想要消除攝影機切入看穿物體的瑕疵，特別是遊戲中的各種植物

於是我們決定要讓靠近攝影機的像素淡出

使用半透明像素並不是個好主意，因為非常消耗效能

我們使用的技巧，是所謂的混色(dithering)

https://en.wikipedia.org/wiki/Dither

使用混色技巧搭配貝爾矩陣(Bayer matrix)

利用一個預先決定的點陣模板來決定哪些像素可以捨棄而不渲染

https://en.wikipedia.org/wiki/Ordered_dithering

結果就是產生半透明的錯覺

一開始使用的貝爾矩陣是個8×8矩陣，取自上述的Wikipedia頁面

我認為這個矩陣太小，會造成不美觀的帶狀瑕疵

我想要使用16×16的貝爾矩陣，但是網路上都找不到相關資料

於是我試著用逆向工程找出8×8貝爾矩陣的遞迴特性

光用目測法，我想我應該可以直接解出16×16貝爾矩陣

但是我想要讓過程更有趣一點

我寫了一個工具，可以生成二的任何次方大小的貝爾矩陣

換到16×16貝爾具陣之後，可以明顯看到帶狀瑕疵的改善

** = 爆炸聲延遲 = **

這個部份我其實沒有什麼大貢獻，但是我還是覺得值得一提

在2015年E3展示中，Nate和Sully同時接收到高塔傳過來的爆炸聲和爆炸畫面

這是不合理的，因為高塔距離非常遠，爆炸聲應該會晚一點才被接收到

我在開展前幾週指出這點，美術團隊後來就在爆炸聲之前加上一小段延遲了

** = 繁體中文在地化 = **

直到送廠壓片前幾週我才開始在遊戲中改用繁體中文字幕，而我找到了許多錯誤

大部分的錯誤都是英文直譯中文，而變成四不像的用語

我認為我沒有足夠的時間可以單槍匹馬全破一次遊戲又同時抓出翻譯錯誤

於是我請幾個QA部門的人分章節、用繁體中文模式遊玩

然後我陸續瀏覽他們的遊玩錄製影片

結果這個方法相當有效率

我成功地把我找到的翻譯錯誤建檔，而在地化小組也有足夠的時間修正翻譯

** = 結束 = **

以上就是我對Uncharted 4開發上值得一提的貢獻

希望大家讀得愉快

This post is part of My Career Series.

Here is the Chinese translation of this post.

本文之中文翻譯在此

Now that Uncharted 4 is released, I am able to talk about what I worked on for the project. I mostly worked on AI for single-player buddies and multiplayer sidekicks, as well as some gameplay logic. I’m leaving out things that never went in to the final game and some minor things that are too verbose to elaborate on. So here it goes:

Before I start, I’d like to mention the post system we used for NPCs. I did not work on the core logic of the system; I helped writing some client code that makes use of this system.

Posts are discrete positions within navigable space, mostly generated from tools and some hand-placed by designers. Based on our needs, we created various post selectors that rate posts differently (e.g. stealth post selector, combat post selector), and we pick the highest-rated post to tell an NPC to go to.

The buddy follow system was derived from The Last of Us.

The basic idea is that buddies pick positions around the player to follow. These potential positions are fanned out from the player, and must satisfy the following linear path clearance tests: player to position, position to a forward-projected position, forward-projected position to the player.

Climbing is something present in Uncharted 4 that is not in The Last of Us. To incorporate climbing into the follow system, we added the climb follow post selector that picks climb posts for buddies to move to when the player is climbing.

It turned out to be trickier than we thought. Simply telling buddies to use regular follow logic when the player is not climbing, and telling them to use climb posts when the player is climbing, is not enough. If the player quickly switch between climbing and non-climbing states, buddies would oscillate pretty badly between the two states. So we added some hysteresis, where the buddies only switch states when the player has switched states and moved far enough while maintaining in that state. In general, hysteresis is a good idea to avoid behavioral flickering.

In some scenarios in the game, we wanted buddies to lead the way for the player. The lead system is ported over from The Last of Us and updated, where designers used splines to mark down the general paths we wanted buddies to follow while leading the player.

In case of multiple lead paths through a level, designers would place multiple splines and turned them on and off via script.

The player’s position is projected onto the spline, and a lead reference point is placed ahead by a distance adjustable by designers. When this lead reference point passes a spline control point marked as a wait point, the buddy would go to the next wait point. If the player backtracks, the buddy would only backtrack when the lead reference point gets too far away from the furthest wait point passed during last advancement. This, again, is hysteresis added to avoid behavioral flickering.

We also incorporated dynamic movement speed into the lead system. “Speed planes” are placed along the spline, based on the distance between the buddy and the player along the spline. There are three motion types NPCs can move in: walk, run, and sprint. Depending on which speed plane the player hits, the buddy picks an appropriate motion type to maintain distance away from the player. Designers can turn on and off speed planes as they see fit. Also, the buddy’s locomotion animation speed is slightly scaled up or down based on the player’s distance to minimize abrupt movement speed change when switching motion types.

In The Last of Us, the player is able to move past a buddy while both remain in cover. This is called cover share.

In The Last of Us, it makes sense for Joel to reach out to the cover wall over Ellie and Tess, who have smaller profile than Joel. But we thought that it wouldn’t look as good for Nate, Sam, Sully, and Elena, as they all have similar profiles. Plus, Uncharted 4 is much faster-paced, and having Nate reach out his arms while moving in cover would break the fluidity of the movement. So instead, we decided to simply make buddies hunker against the cover wall and have Nate steer slightly around them.

The logic we used is very simple. If the projected player position based on velocity lands within a rectangular boundary around the buddy’s cover post, the buddy aborts current in-cover behavior and quickly hunkers against the cover wall.

Medic sidekicks in multiplayer required a whole new behavior that is not present in single-player: reviving downed allies and mirroring the player’s cover behaviors.

Medics try to mimic the player’s cover behavior, and stay as close to the player as possible, so when the player is downed, they are close by to revive the player. If a nearby ally is downed, they would also revive the ally, given that the player is not already downed. If the player is equipped with the RevivePak mod for medics, they would try to throw RevivePaks at revive targets before running to the targets for revival (multiple active revivals reduce revival time); throwing RevivePaks reuses the grenade logic for trajectory clearance test and animation playback, except that grenades were swapped out with RevivePaks.

Crouch-moving in stealth grass is also something new in Uncharted 4. For it to work, we need to somehow mark the environment, so that the player gameplay logic knows whether the player is in stealth grass. Originally, we thought about making the background artists responsible of marking collision surfaces as stealth grass in Maya, but found out that necessary communication between artists and designers made iteration time too long. So we arrived at a different approach to mark down stealth grass regions. An extra stealth grass tag is added for designers in the editor, so they could mark the nav polys that they’d like the player to treat as stealth grass, with high precision. With this extra information, we can also rate stealth posts based on whether they are in stealth grass or not. This is useful for buddies moving with the player in stealth.

Since we don’t have listen mode in Uncharted 4 like The Last of Us, we needed to do something to make the player aware of imminent threats, so the player doesn’t feel overwhelmed by unknown enemy locations. Using the enemy perception data, we added the colored threat indicators that inform the player when an enemy is about to notice him/her as a distraction (white), to perceive a distraction (yellow), and to acquire full awareness (orange). We also made the threat indicator raise a buzzing background noise to build up tension and set off a loud stinger when an enemy becomes fully aware of the player, similar to The Last of Us.

This is the last major gameplay feature I took part in on before going gold. I don’t usually go to formal meetings at Naughty Dog, but for the last few months before gold, we had a at least one meeting per week driven by Bruce Straley or Neil Druckmann, focusing on the AI aspect of the game. Almost after every one of these meetings, there was something to be changed and iterated for the investigation system. We went through many iterations before arriving at what we shipped with the final game.

There are two things that create distractions and would cause enemies to investigate: player presence and dead bodies. When an enemy registers a distraction (distraction spotter), he would try to get a nearby ally to investigate with him as a pair. The closer one to the distraction becomes the investigator, and the other becomes the watcher. The distraction spotter can become an investigator or a watcher, and we set up different dialog sets for both scenarios (“There’s something over there. I’ll check it out.” versus “There’s something over there. You go check it out.”).

In order to make the start and end of investigation look more natural, we staggered the timing of enemy movement and the fading of threat indicators, so the investigation pair don’t perform the exact same action at the same time in a mechanical fashion.

If the distraction is a dead body, the investigator would be alerted of player presence and tell everyone else to start searching for the player, irreversibly leaving ambient/unaware state. The dead body discovered would also be highlighted, so the player gets a chance to know what gave him/her away.

Under certain difficulties, consecutive investigations would make enemies investigate more aggressively, having a better chance of spotting the player hidden in stealth grass. In crushing difficulty, enemies always investigate aggressively.

This is also among the last few things I helped out with for this project.

Dialog looks refers to the logic that makes characters react to conversations, such as looking at the other people and hand gestures. Previously in The Last of Us, people spent months annotating all in-game scripted dialogs with looks and gestures by hand. This was something we didn’t want to do again. We had some scripted dialogs that are already annotated by hand, but we needed a default system that handles dialogs that are not annotated. The animators are given parameters to adjust the head turn speed, max head turn angle, look duration, cool down time, etc.

One of the problems we had early on regarding the jeep driving section in the Madagascar city level, is that the player’s jeep can easily spin out and lose momentum after hitting a wall or an enemy vehicle, throwing the player far behind the convoy and failing the level.

My solution was to temporarily cap the angular velocity and change of linear velocity direction upon impact against walls and enemy vehicles. This easy solution turns out pretty effective, making it much harder for players to fail the level due to spin-outs.

Driveable vehicles are first introduced in Uncharted 4. Previously, only NPCs can drive vehicles, and those vehicles are constrained to spline rails. I helped handling vehicle deaths.

There are multiple ways to kill enemy vehicles: kill the driver, shoot the vehicle enough times, bump into an enemy bike with your jeep, and ram your jeep into an enemy jeep to cause a spin-out. Based on various causes of death, a death animation is picked to play for the dead vehicle and all its passengers. The animation blends into physics-controlled ragdolls, so the death animation smoothly transitions into physically simulated wreckage.

For bumped deaths of enemy bikes, we used the bike’s bounding box on the XZ plane and the contact position to determine which one of the four directional bump death animations to play.

As for jeep spin-outs, the jeep’s rotational deviation from desired driving direction is tested against a spin-out threshold.

When playing death animations, there’s a chance that the dead vehicle can penetrate walls. A sphere cast is used, from the vehicle’s ideal position along the rail if it weren’t dead, to where the vehicle’s body actually is. If a contact is generated from the sphere cast, the vehicle is shifted in the direction of the contact normal by a fraction of penetration amount, so the de-penetration happens gradually across multiple frames, avoiding positional pops.

We made a special type of vehicle death, called vehicle death hint. They are context-sensitive death animations that interact with environments. Animators and designers place these hints along the spline rail, and specify entry windows on the splines. If a vehicle is killed within an entry window, it starts playing the corresponding special death animation. This feature started off as a tool to implement the specific epic jeep kill in the 2015 E3 demo.

We wanted to eliminate geometry clipping the camera when the camera gets too close to environmental objects, mostly foliage. So we decided to fade out pixels in pixel shaders based on how close the pixels are to the camera. Using transparency was not an option, because transparency is not cheap, and there’s just too much foliage. Instead, we went with dithering, combining a pixel’s distance from the camera and a patterned Bayer matrix, some portion of the pixels are fully discarded, creating an illusion of transparency.

Our original Bayer matrix was an 8×8 matrix shown on this Wikipedia page. I thought it was too small and resulted in banding artifacts. I wanted to use a 16×16 Bayer matrix, but it was no where to be found on the internet. So I tried to reverse engineer the pattern of the 8×8 Bayer matrix and noticed a recursive pattern. I would have been able to just use pure inspection to write out a 16×16 matrix by hand, but I wanted to have more fun and wrote a tool that can generate Bayer matrices sized any powers of 2.

After switching to the 16×16 Bayer matrix, there was a noticeable improvement on banding artifacts.

This is a really minor contribution, but I’d still like to mention it. A couple weeks before the 2015 E3 demo, I pointed out that the tower explosion was seen and heard simultaneously and that didn’t make sense. Nate and Sully are very far away from the tower, they should have seen and explosion first and then heard it shortly after. The art team added a slight delay to the explosion sound into the final demo.

I didn’t switch to Traditional Chinese text and subtitles until two weeks before we were locking down for gold, and I found some translation errors. Most of the errors were literal translations from English to Traditional Chinese and just did’t work in the contexts. I did not think I would have time to play through the entire game myself and look out for translation errors simultaneously. So I asked multiple people from QA to play through different chapters of the game in Traditional Chinese, and I went over the recorded gameplay videos as they became available. This proved pretty efficient; I managed to log all the translation errors I found, and the localization team was able to correct them before the deadline.

These are pretty much the things I worked on for Uncharted 4 that are worth mentioning. I hope you enjoyed reading it.

]]>