It occurred to me that the entire time I’ve been working with quaternions, I have never read or learned about the derivation of the formula for slerp, spherical linear interpolation. I just learned the final formula and have been using it.

Upon a preliminary search I couldn’t seem to immediately find a straightforward derivation, either (at least not one that fits in the context of game development). So I thought it might be a fun exercise to derive it myself.

As it turns out, it is indeed fun and could probably serve as an interesting trigonometry & vector quiz question!

A quick recap: slerp is an operation that interpolates between two vectors along the shortest arc (in any dimension higher than 1D). It takes as input the two vectors to interpolate between plus an interpolation parameter:

where is the angle between the two vectors:

If the interpolation parameter changes at a constant rate, the angular velocity of the slerp result is also constant. If is set to , it means the slerp result is “the 25% waypoint on the arc from to : the angle between and the slerp result is , and the angle between and the slerp result is .

In the context of game development, slerp is typically used to interpolate between orientations represented by quaternions, which can be expressed as 4D vectors. In this case the shortest arc slerp interpolates across lies on a 4D hypersphere.

As mentioned before, this formula can be used on any vectors in any dimension higher than 1D. So it can also be used to interpolate between two 3D vectors along a sphere, or between two 2D vectors along a circle.

In the context of game development, we almost exclusively work with unit quaternions. So in my derivation, I make the assumption that the vectors we are working with are all unit vectors. The flow of the derivation should be pretty much the same even if the vectors are not unit vectors.

Without further ado, here’s the derivation.

Let be the results of slerp:

And let be the angle between and .

Knowing that the angle between and is , and the angle between and is , we can come up with this figure:

Here’s the strategy. We build a pair of orthogonal axes and from and . Then, we use the parametric circle formula to find :

Since is already a unit vector that convenient lies on the horizontal axis in the figure, let’s just pick . So then can be found by taking away the component in that is parallel to and normalizing the remainder:

Now plug and back into the parametric circle formula:

And voila! We have our slerp formula.

*Edit: Eric Lengyel has pointed out there’s another way to derive the slerp formula using similar triangles, presented in his Mathematics for 3D Game Programming and Computer Graphics, 3rd ed., Section 4.6.3.*

Source files are on GitHub.

Shortcut to sterp implementation.

Shortcut to code used to generate animations in this post.

Slerp, spherical linear interpolation, is an operation that interpolates from one orientation to another, using a rotational axis paired with the smallest angle possible.

Quick note: Jonathan Blow explains here how you should avoid using slerp, if normalized quaternion linear interpolation (nlerp) suffices. Long store short, nlerp is faster but does not maintain constant angular velocity, while slerp is slower but maintains constant angular velocity; use nlerp if you’re interpolating across small angles or you don’t care about constant angular velocity; use slerp if you’re interpolating across large angles and you care about constant angular velocity. But for the sake of using a more commonly known and used building block, the remaining post will only mention slerp. Replacing all following occurrences of slerp with nlerp would not change the validity of this post.

In general, slerp is considered superior over interpolating individual components of Euler angles, as the latter method usually yields orientational sways.

But, sometimes slerp might not be ideal. Look at the image below showing two different orientations of a rod. On the left is one orientation, and on the right is the resulting orientation of rotating around the axis shown as a cyan arrow, where the pivot is at one end of the rod.

If we slerp between the two orientations, this is what we get:

Mathematically, slerp takes the “shortest rotational path”. The quaternion representing the rod’s orientation travels along the shortest arc on a 4D hypersphere. But, given the rod’s elongated appearance, the rod’s moving end seems to be deviating from the shortest arc on a 3D sphere.

My intended effect here is for the rod’s moving end to travel along the shortest arc in 3D, like this:

The difference is more obvious if we compare them side-by-side:

This is where swing-twist decomposition comes in.

Swing-Twist decomposition is an operation that splits a rotation into two concatenated rotations, swing and twist. Given a twist axis, we would like to separate out the portion of a rotation that contributes to the twist around this axis, and what’s left behind is the remaining swing portion.

There are multiple ways to derive the formulas, but this particular one by Michaele Norel seems to be the most elegant and efficient, and it’s the only one I’ve come across that does not involve any use of trigonometry functions. I will first show the formulas now and then paraphrase his proof later:

Given a rotation represented by a quaternion and a twist axis , combine the scalar part from the projection of onto to form a new quaternion:

We want to decompose into a swing component and a twist component. Let the denote the swing component, so we can write . The swing component is then calculated by multiplying with the inverse (conjugate) of :

Beware that and are not yet normalized at this point. It’s a good idea to normalize them before use, as unit quaternions are just cuter.

Below is my code implementation of swing-twist decomposition. Note that it also takes care of the singularity that occurs when the rotation to be decomposed represents a 180-degree rotation.

public static void DecomposeSwingTwist ( Quaternion q, Vector3 twistAxis, out Quaternion swing, out Quaternion twist ) { Vector3 r = new Vector3(q.x, q.y, q.z); // singularity: rotation by 180 degree if (r.sqrMagnitude < MathUtil.Epsilon) { Vector3 rotatedTwistAxis = q * twistAxis; Vector3 swingAxis = Vector3.Cross(twistAxis, rotatedTwistAxis); if (swingAxis.sqrMagnitude > MathUtil.Epsilon) { float swingAngle = Vector3.Angle(twistAxis, rotatedTwistAxis); swing = Quaternion.AngleAxis(swingAngle, swingAxis); } else { // more singularity: // rotation axis parallel to twist axis swing = Quaternion.identity; // no swing } // always twist 180 degree on singularity twist = Quaternion.AngleAxis(180.0f, twistAxis); return; } // meat of swing-twist decomposition Vector3 p = Vector3.Project(r, twistAxis); twist = new Quaternion(p.x, p.y, p.z, q.w); twist = Normalize(twist); swing = q * Quaternion.Inverse(twist); }

Now that we have the means to decompose a rotation into swing and twist components, we need a way to use them to interpolate the rod’s orientation, replacing slerp.

Replacing slerp with the swing and twist components is actually pretty straightforward. Let the and denote the quaternions representing the rod’s two orientations we are interpolating between. Given the interpolation parameter , we use it to find “fractions” of swing and twist components and combine them together. Such fractiona can be obtained by performing slerp from the identity quaternion, , to the individual components.

So we replace:

with:

From the rod example, we choose the twist axis to align with the rod’s longest side. Let’s look at the effect of the individual components and as varies over time below, swing on left and twist on right:

And as we concatenate these two components together, we get a swing-twist interpolation that rotates the rod such that its moving end travels in the shortest arc in 3D. Again, here is a side-by-side comparison of slerp (left) and swing-twist interpolation (right):

I decided to name my swing-twist interpolation function **sterp**. I think it’s cool because it sounds like it belongs to the function family of **lerp** and **slerp**. Here’s to hoping that this name catches on.

And here’s my code implementation:

public static Quaternion Sterp ( Quaternion a, Quaternion b, Vector3 twistAxis, float t ) { Quaternion deltaRotation = b * Quaternion.Inverse(a); Quaternion swingFull; Quaternion twistFull; QuaternionUtil.DecomposeSwingTwist ( deltaRotation, twistAxis, out swingFull, out twistFull ); Quaternion swing = Quaternion.Slerp(Quaternion.identity, swingFull, t); Quaternion twist = Quaternion.Slerp(Quaternion.identity, twistFull, t); return twist * swing; }

Lastly, let’s look at the proof for the swing-twist decomposition formulas. All that needs to be proven is that the swing component does not contribute to any rotation around the twist axis, i.e. the rotational axis of is orthogonal to the twist axis.

Let denote the parallel component of to , which can be obtained by projecting onto :

Let denote the orthogonal component of to :

So the scalar-vector form of becomes:

Using the quaternion multiplication formula, here is the scalar-vector form of the swing quaternion:

Take notice of the vector part of the result:

This is a vector parallel to the rotational axis of . Both and are orthogonal to the twist axis , so we have shown that the rotational axis of is orthogonal to the twist axis. Hence, we have proven that the formulas for and are valid for swing-twist decomposition.

That’s all.

Given a twist axis, I have shown how to decompose a rotation into a swing component and a twist component.

Such decomposition can be used for swing-twist interpolation, an alternative to slerp that interpolates between two orientations, which can be useful if you’d like some point on a rotating object to travel along the shortest arc.

I like to call such interpolation **sterp**.

Sterp is merely an alternative to slerp, not a replacement. Also, slerp is definitely more efficient than sterp. Most of the time slerp should work just fine, but if you find unwanted orientational sway on an object’s moving end, you might want to give sterp a try.

An application of swing-twist decomposition in 2D just came to mind.

If the twist axis is chosen to be orthogonal to the screen, then we can utilize swing-twist decomposition to use the orientation of objects in 3D to drive the rotation of 2D elements in screen space or some other data. The twist component represents exactly the portion of 3D rotation projected onto screen space.

However, in terms of performance, we might be better off just projecting a 3D object’s local axis onto screen space and find the angle between it and a screen space axis. But then again, the swing-twist decomposition approach doesn’t have the singularity the projection approach has when the chosen local axis becomes orthogonal to the screen.

]]>This post is part of my Game Programming Series.

Complete source code for the debug draw utility and Unity scene for generating the demo animation above can be found on GitHub. Here is a shortcut to the debug draw utility class. And here is a shortcut to the shaders.

A couple weeks ago, I documented how I implemented a wireframe Unity debug draw utility using cached mesh pools and vertex shaders.

Recently, I have upgraded the utility to now support various shaded styles, including solid color, flat-shaded, and smooth-shaded. This post is a documentation of my development process and how I solved some of the challenges on the way.

For each mesh rendered in wireframe style, the original mesh factory only needed to generate an array of unique vertices, along with an index array containing the vertex indices in either lines or line strip topology.

To generate a mesh to be rendered in solid color style, I reused the same unique vertex arrays, but the index arrays hadto be changed to contain vertex indices in triangle topology, three indices per triangle.

Once the generation of meshes for solid color style was done, I decided counter-intuitively to first implement the “fancier” smooth-shaded style before the flat-shaded style, because the former was actually an easier incremental change from the solid color style. Taking spheres for example, the vertex array actually still didn’t need to be changed; I just had to create an array of normals that is the exact copy of the vertices. Recall from the previous post that in order to reduce numbers of cached meshes, I offloaded scaling to the vertex shaders and just generated meshes that are unit primitives. The normal of a vertex of a smooth-shaded unit sphere is just conveniently identical to the vertex positional vector.

Figuring out the index arrays for other smooth-shaded primitive meshes wasn’t as straightforward as spheres, but it wasn’t too hard either. I still didn’t need to change most of the vertex arrays and just had to figure out the proper accompanying normal array and index array. Cones were a notable exception, because even with smooth-shaded style, they still have some normal discontinuity along the base edges, which required duplicates of the base edge vertices with different normals.

Finally moving onto the flat-shaded style, most primitives required me to modify the generation of vertex arrays, normal arrays, and index arrays. Arrays of unique vertices no longer worked, because a vertex shared by multiple faces (triangles, quads, circles, etc.) would have a different normal on each face. For each face, a new set of vertices had to be put into the vertex array. Different primitives required slightly different techniques to generate the vertices for each face. Taking spheres for example again, for each longitudinal strip, two triangles connecting to the poles plus two triangles per quad along the strip were needed. The normals were simply computed with cross products of any two non-parallel vectors connecting vertices in each face.

I generally followed this pattern for triangles:

Vector3[] aVert = new Vector3[numVerts]; Vector3[] aNormal = new Vector3[numNormals]; int[] aIndex = new int[numIndices]; int iVert = 0; int iNormal = 0; int iIndex = 0; for (int i = 0; i < numIterations; ++i) { int iTriStart = iVert; aVert[iVert++] = ComputeTriVert0(i); aVert[iVert++] = ComputeTriVert1(i); aVert[iVert++] = ComputeTriVert2(i); Vector3 tri01 = aVert[iTriStart + 1] - aVert[iTriStart]; Vector3 tri02 = aVert[iTriStart + 2] - aVert[iTriStart]; Vector3 triNormal = Vector3.Cross(tri01, tri02).normalized; aNormal[iNormal++] = triNormal; aNormal[iNormal++] = triNormal; aNormal[iNormal++] = triNormal; aIndex[iIndex++] = iTriStart; aIndex[iIndex++] = iTriStart + 1; aIndex[iIndex++] = iTriStart + 2; }

And this pattern for quads:

Vector3[] aVert = new Vector3[numVerts]; Vector3[] aNormal = new Vector3[numNormals]; int[] aIndex = new int[numIndices]; int iVert = 0; int iNormal = 0; int iIndex = 0; for (int i = 0; i < numIterations; ++i) { int iQuadStart = iVert; aVert[iVert++] = ComputeQuadVert0(i); aVert[iVert++] = ComputeQuadVert1(i); aVert[iVert++] = ComputeQuadVert2(i); aVert[iVert++] = ComputeQuadVert3(i); Vector3 quad01 = aVert[iQuadStart + 1] - aVert[iQuadStart]; Vector3 quad02 = aVert[iQuadStart + 2] - aVert[iQuadStart]; Vector3 quadNormal = Vector3.Cross(quad01, quad02).normalized; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aIndex[iIndex++] = iQuadStart; aIndex[iIndex++] = iQuadStart + 1; aIndex[iIndex++] = iQuadStart + 2; aIndex[iIndex++] = iQuadStart; aIndex[iIndex++] = iQuadStart + 2; aIndex[iIndex++] = iQuadStart + 3; }

The positional portion of the vertex shader for all styles is actually identical, so I wanted to find a way to avoid creating an extra set of vertex and fragment shaders just in order to add the logic for normals. Then I found out about Unity’s shader variant feature. By using the `shader_feature`

keyword and `#ifdef`

‘s in the shaders, combined with the `Material.EnableKeyword`

method, I was able to choose from a collection of variants generated from a single master shader at run time for each primitive mesh type. I used the `NORMAL_ON`

keyword for the normal feature.

As shown below, only when the `NORMAL_ON`

keyword is enabled are normals included in the vertex structs.

#pragma shader_feature NORMAL_ON struct appdata { float4 vertex : POSITION; #ifdef NORMAL_ON float3 normal : NORMAL; #endif }; struct v2f { float4 vertex : SV_POSITION; #ifdef NORMAL_ON float3 normal : NORMAL; #endif };

The model-view matrix is used to transform vertex positions from object space into view space, but normals need to be transformed using the inverse transpose of the model-view matrix. Since the scaling is offloaded to the shader, I needed to fold in the scaling portion of the inverse transpose of the model-view matrix myself.

v2f vert (appdata v) { v2f o; // ... #ifdef NORMAL_ON float4x4 scaleInverseTranspose = float4x4 ( 1.0f / _Dimensions.x, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f / _Dimensions.y, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f / _Dimensions.z, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f ); float4x4 m = mul(UNITY_MATRIX_IT_MV, scaleInverseTranspose); o.normal = mul(m, float4(v.normal, 0.0f)).xyz; #endif return o; }

I also used the `shader_feature`

keyword to optionally activate the “cap shift/scaling” logic for cylinders and capsules. Recall from the previous post that in order not to generate a mesh for each possible height, only unit-height cylinder and capsule meshes are generated, and the caps are shifted towards the X-Z plane, scaled, and then shifted back to the final height. I used the `CAP_SHIFT_SCALE`

keyword for this feature.

#pragma shader_feature CAP_SHIFT_SCALE // (x, y, z) == (dimensionX, dimensionY, dimensionZ) // w == capShiftScale // shifts 0.5 towards X-Z plane, scale by dimensions, // and then shoft back 0.5 * capShiftScale) float4 _Dimensions; v2f vert (appdata v) { v2f o; #ifdef CAP_SHIFT_SCALE const float ySign = sign(v.vertex.y); v.vertex.y -= ySign * 0.5f; #endif v.vertex.xyz *= _Dimensions.xyz; #ifdef CAP_SHIFT_SCALE v.vertex.y += ySign * 0.5f * _Dimensions.w; #endif o.vertex = UnityObjectToClipPos(v.vertex); // ... return o; }

I noticed some Z-fighting between the two styles when I drew the same meshes twice, once in wireframe style and once in shaded style. It was actually an easy fix. I just added a small Z-bias to make sure the wireframe lines are always drawn in front of the shaded pixels.

float _ZBias; v2f vert (appdata v) { v2f o; // ... o.vertex = UnityObjectToClipPos(v.vertex); o.vertex.z += _ZBias; // ... return 0; }

And finally here’s the fragment shader. It really doesn’t contain anything out of the ordinary, except that it remaps the vertex brightness from (0.0, 1.0) to (0.3, 1.0), because I really don’t like completely black pixels.

fixed4 frag (v2f i) : SV_Target { fixed4 color = _Color; #ifdef NORMAL_ON i.normal = normalize(i.normal); color.rgb *= 0.7f * i.normal.z + 0.3f; // darkest at 0.3f #endif return color; }

That’s it! I am pretty satisfied with the current Unity debug draw utility. It’s also easy to combine primitives to make more interesting shapes, such as the arrows shown in the demo animation above.

Potentially, the meshes for flat-shaded and smooth-shaded styles, generated from the mesh factory, can be used to implement a gizmo utility. But I’ll probably only do it when I really need it.

Stay tuned for more documentation of my future venture into Unity land.

Until next time!

]]>This post is part of my Game Programming Series.

Complete source code for the debug draw utility and Unity scene for generating the demo animation above can be found on GitHub. Here is a shortcut to the debug draw utility class. And here is a shortcut to the shaders.

I’ve recently started picking up Unity, and quickly found out that the only easily accessible debug draw function is `Debug.DrawLine`

, unless I was mistaken (in which case please do let me know).

So I thought it was a good opportunity to familiarize myself with Unity’s environment and a great exercise to implement a debug draw utility that draws various primitives, including rectangles, boxes, spheres, cylinders, and capsules. This post is essentially a quick documentation of what I have done and problems I’ve encountered.

As my first iteration, I took the naive approach and just wrote functions that internally make a bunch of calls to `Debug.DrawLine`

. You can see such first attempt here in the history.

The majority of the time spent was pretty much figuring out the right math, so nothing special. I guess the only thing worth pointing out is how I arranged the loops in the functions for spheres and capsules. My first instinct was to draw “from top to bottom”, looping from one pole to the other and constructing rings of line segments along the way, with special cases at the poles handled outside the loop. However, I didn’t like the idea of part of the math outside the loop, as it didn’t feel elegant enough (note: this is just my personal preference). So I came up with a different way of doing it, where I “assemble identical longitudinal pieces” around the central axis that connects the poles. In this case, there are no special cases outside the loop body.

After my first attempt, I got curious as to how other people debug draw spheres in Unity, and I came across this gist. This is when it occurred to me that I can get better performance by caching the mathematical results into meshes, and then simply draw the cached meshes, as well as offloading some of the work onto the GPU with vertex shaders.

There are a bunch of primitives in my debug draw utility, so I won’t enumerate every single one of them. I’ll just use the capsule as an example.

I didn’t want to create a new mesh for every single combination of height, radius, latitudinal segments, and longitudinal segments, because you can have so many different combinations of floats that it’s impractical. Instead, I used just the latitudinal and longitudinal segments to generate a key for each cached mesh, and modify the vertices in the vertex shader with height and radius as shader input.

private static Dictionary<int, Mesh> s_meshPool; private static Material s_material; private static MaterialPropertyBlock s_matProperties; public static void DrawCapsule(...) { if (latSegments <= 0 || longSegments <= 1) return; if (s_meshPool == null) s_meshPool = new Dictionary<int, Mesh>(); int meshKey = (latSegments << 16 ^ longSegments); Mesh mesh; if (!s_meshPool.TryGetValue(meshKey, out mesh)) { mesh = new Mesh(); // ... s_meshPool.Add(meshKey, mesh); } if (s_material == null) { s_material = new Material(Shader.Find("CjLib/CapsuleWireframe")); } if (s_matProperties == null) s_matProperties = new MaterialPropertyBlock(); s_matProperties.SetColor("_Color", color); s_matProperties.SetVector("_Dimensions", new Vector4(height, radius)); Graphics.DrawMesh(mesh, center, rotation, s_material, 0, null, 0, s_matProperties); }

And below is the vertex shader. I basically shift each cap towards the center, scale the vertices using the radius, and push them back out using the height. I used the `sign`

function to effectively branch on which side of the XZ plane the vertices are on, without actually introducing a code branch in the shader.

float4 _Dimensions; // (height, radius, *, *) v2f vert (appdata v) { v2f o; float ySign = sign(v.vertex.y); v.vertex.y -= ySign * 0.5f; v.vertex.xyz *= _Dimensions.y; v.vertex.y += ySign * 0.5f * _Dimensions.x; o.vertex = UnityObjectToClipPos(v.vertex); return o; }

However, I spent 2 hours past midnight just scratching my head, trying to figure out why some of my debug draw meshes pop around as I shift and rotate the camera. It was as if the positional pops are dependent on the camera position and orientation, which was quite bizarre. It finally occurred to me that I might not have been consistently getting vertex positions in object space in the vertex shader, and based on that assumption I found this post that confirmed my suspicion.

Basically, Unity has draw call batching turned on by default, so it inconsistently passed in vertex positions to vertex shaders in either object space or world space. It’s actually stated in Unity’s documentation here under the not-so-obvious `DisableBatching`

tag section, that vertex shaders operating in object space won’t work reliably if draw call batching is on.

Although the process of figuring out what went wrong was annoying, the fix was luckily quite simple: just disable draw call batching in the vertex shaders.

Tags { "DisableBatching" = "true" }

That’s it! I hope you find this post interesting. I will likely continue to document my ventures into the Unity world.

]]>The source files for generating the animations in this post are on GitHub.

本文之中文翻譯在此 (by Wayne Chen)

Timeslicing is a very useful technique to improve the performance of batched algorithms (multiple instances of the same algorithm): instead of running all instances of algorithms in a single frame, spread them across multiple frames.

For instance, if you have 100 NPCs in a game level, you typically don’t need to have every one of them make a decision in every single frame; having 50 NPCs make decisions in each frame would effectively reduce the decision performance overhead by 50%, 25 NPCs by 75%, and 20 NPCs by 80%.

Note that I said timeslicing the decisions, __not__ the whole update logic of the NPCs. In every frame, we’d still want to animate every NPC, or at least the ones closer and more noticeable to the player, based on the **latest decision**. The extra animation layer can usually hide the slight latency in the timesliced decision layer.

Also bear in mind that I will not be discussing how to finish a single algorithm across multiple frames, which is another form of timeslicing that is not within the scope of this post. Rather, this post will focus on spreading multiple instances of the same algorithm across multiple frames, where each instance is small enough to fit in a single frame.

Such timeslicing technique applies to batched algorithms that are not hyper-sensitive to latency. If even a single frame of latency is critical to certain batched algorithms, it’s probably not a good idea to timeslice them.

In this post, I’d like to cover:

- An example that involves running multiple instances of a simple algorithm in batch.
- How to timeslice such batched algorithms.
- A categorization for timeslicing based on the timing of input and output.
- A sample implementation of a timeslicer utility class.
- Finally, how threads can be brought into the mix.

The example I’m going to use is a simple logic that orients NPCs to face a target. Each NPC’s decision layer computes the desired orientation to face the target, and the animation layer tries to rotate the NPCs to match their desired orientation, capped at a maximum angular speed.

First, let’s see an animated illustration of what it might look like if this algorithm is run for every NPC in every frame (Update All).

The moving circle is the target, the black pointers represent NPCs and their orientation, and the red indicators represent the NPCs’ desired orientation.

And the code looks something like this:

void NpcManager::UpdateFrame(float dt) { for (Npc &npc : m_npcs) { npc.UpdateDesiredOrientation(target); npc.Animate(dt); } } void Npc::UpdateDesiredOrientation(const Object &target) { m_desiredOrientation = LookAt(target); } void Npc::Animate(float dt) { Rotation delta = Diff(m_desiredOrientation, m_currentOrientation); delta = Limit(delta, m_maxAngularSpeed); m_currentOrientation = Apply(m_currentOrientation, delta); }

As mentioned above, you typically don’t need to update all the NPCs’ decisions in one frame. We can achieve rudimentary timeslicing like this:

void NpcManager::UpdateFrame(float dt) { const unsigned kMaxUpdates = 4; unsigned npcUpdate = 0; while (npcUpdated < m_numNpcs && npcUpdated < kMaxUpdates) { m_aNpc[m_iNpcWalker].UpdateDesiredOrientation(target); m_iNpcWalker = (m_iNpcWalker + 1) % m_numNpc; ++npcUpdated; } for (Npc &npc : m_npcs) { npc.Animate(dt); } }

This straightforward approach could be enough. However, sometimes you just need more control over the timing of input and output. Using the more involved timeslicing logic presented below, you can have a choice of different timing of input and output to suit specific needs.

Before going any further, let’s take a look at the terminology that will be used throughout this post.

- Completing a
**batch**means finishing running the decision logic once for each NPC. - A
**job**represents the work to run an instance of decision logic for an NPC. - The
**input**is the data required to run a job. - The
**output**is the results from a job after it’s finished

Now the timeslicing logic.

Here are the steps of one way to timeslice batched algorithms. It’s probably not the absolute best in terms of efficiency or memory usage, but I find it logically clear and easy to maintain (which also means it’s good for presentational purposes). So unless you absolutely need to micro-optimize, I wouldn’t worry about it too much.

- Start a new batch.
- Figure out all the jobs that need to be done. Associate each job with a unique
**key**that can be used to infer the required input for the job. - For each job, prepare an instance of job
**parameters**that is a collection of its key, input, and output. - Start and finish up to a max number of jobs per frame.
- Depending on the timing of output (more on this later),
**save**the**results**of a job, including the job’s output and its associated key, by pushing it to a**ring buffer**that represents the**history**of job results. The rest of the game logic to query latest results by key. - After all jobs are finished, the batch is finished. Rinse and repeat.

One advantage of looking up output by key is that different timesliced systems can work with each other just fine, even if they reference each other’s output. As far as a system is concerned, it’s looking up output from another system using a key, and the other system is reporting back the latest valid output available associated with the given key. Sort of like a mini database.

In our example, since each job is associated with an NPC, it seems fitting to use the NPCs as individual keys.

Next, here’s a categorization of timeslicing, based on the timing of reading input and saving output.

NOTE: The use of words “synchronous” and “asynchronous” here has nothing to do with multi-threading. The words are only used to distinguish the timing of operations. Everything presented before the “Threads” section later in this post is single-threaded.

**Asynchronous Input**: input is read by a job only when it’s started.**Synchronous Input**: input is read by all jobs when a new batch starts.**Asynchronous Output**: a job’s output is saved as soon as the job finishes.**Synchronous Output**: output of all jobs is saved when a batch finishes.

A ring buffer is used so that the rest of the game logic can be completely agnostic to the timing, and assume that the output (queried by key) is the latest.

Mixing and matching different timing of input and output gives 4 combinations. Async input / async output (AIAO), sync input / sync output (SISO), sync input / async output (SIAO), and async input / sync output (AISO). Let’s look at them one by one.

For demonstrational purposes, all animated illustrations below reflect a setup where only one job is started in each frame. The number should be set higher in a real game if it is introducing unacceptable latency.

For our specific example of NPCs turning to face the target, the AIAO combination probably makes the most sense. The input is read only when the job starts, so the job has the latest position of the target. The output is saved as soon as the job finishes with results of NPC’s desired orientation, so the NPC’s animation layer can react to the latest desired orientation immediately.

Here’s an animated illustration of what it could look like if we run the jobs at 10Hz (10 NPC jobs per second).

And here’s what it looks like if done at 30Hz.

You can see each that NPC waits until its job starts before getting the latest position of the target, and updates its desired orientation as soon as the job finishes.

For cases where asynchronous input from the AIAO combination as shown above is causing unwanted staggering, yet NPCs are still desired to react as soon as each of their job finishes, we can use the SIAO combination.

Here’s the 10Hz version.

And here’s the 30Hz version.

Note that when each job starts, it’s using the same target position as input, which has been synchronized at the start of each batch, while the output is saved for immediate NPC reaction as soon as each job finishes.

This is effectively the same as the first “basic first attempt” at timeslicing shown above.

The SISO combination is probably best explained by looking at the animated illustrations first. In order, below are the 10Hz and 30Hz versions of this combination.

It’s basically a “laggy” version of the very first animated illustration where every NPC is fully updated in every frame. All job input is synchronized upon batch start, and all output is saved out upon batch finish. Essentially this is kind of a “double buffer”, where the latest results aren’t reflected until all jobs in a batch are finished. For this reason, the history ring buffer must be **at least twice as large** as the max batch size for combinations with **synchronized output** to work properly.

The SISO combination is probably not ideal for our specific example. However, for cases like updating influence maps, heat maps, or any kind of game space analysis, the SISO combination could prove useful.

To be frank, I can’t think of a proper scenario to justify the use of the AISO combination. It’s only included here for comprehensive purposes. See the animated illustrations below in the order of the 10Hz version and 30Hz version. If you can think of a case where the AISO combination is a superior choice to the other three, please share your ideas in the comments or email me. I’d really like to know.

Now that we’ve seen all four combinations of timeslicing, it’s time to look at a sample implementation that does exactly what has been shown above.

Before going straight to the core timeslicing logic, let’s first look at how it plugs into the sample NPC code we saw earlier.

The timeslicer utility class allows users to provide a function that sets up keys for a new batch (writes to an array and returns new batch size), a function to set up input for job (writes to input based on key), and a function that is the logic to be timesliced (writes to output based on key and input).

class NpcManager { private: struct NpcJobInput { Point m_targetPos; }; struct NpcJobOutput { Orientation m_desiredOrientation; }; // timeslicing utility class Timeslicer < Npc*, // key NpcJobInput, // input NpcJobOutput, // output kMaxNpcs, // max batch size false, // sync input flag (false = async) false // sync output flag (false = async) > m_npcTimeslicer; // ...other stuff }; void NpcManager::Init() { // set up keys for new batch auto newBatchFunc = [this](Npc **aKey) unsigned { for (unsigned i = 0; i < m_numNpcs; ++i) { aKey[i] = GetNpc(i); } return m_numNpcs; }; // set up input for job auto setUpInputFunc = [this](Npc *pNpc, Input *pInput)->void { pInput->m_targetPos = GetTargetPosition(pNpc); } // logic to be timesliced auto jobFunc = [this](Npc *pNpc, const Input &input, Output *pOutput)->void { pOutput->m_desiredOrientation = LookAt(pNpc, input.m_targetPosition); }; // initialize timeslicer m_npcTimeslicer.Init ( newBatchFunc, setUpInputFunc, jobFunc ); } void NpcManager::UpdateFrame(float dt) { // timeslice decision logic m_timeslicer.Update(maxJobsPerFrame); // animate all NPCs based on latest decision results for (Npc &npc : m_npcs) { Output output; if (!m_timeSlicer.GetOutput(&npc, &output)) { npc.SetDesiredOrientation(output.m_desiredOrientation); } npc.Animate(dt); } }

And below is the timeslicer utility class in its entirety.

template < typename Input, typename Output, typename Key, unsigned kMaxBatchSize, bool kSyncInput, bool kSyncOutput > class Timeslicer { private: struct JobParams { Key m_key; Input m_input; Output m_output; }; struct JobResults { Key m_key; Output m_output; }; // number of jobs in current batch unsigned m_batchSize; // keep track of jobs in current frame unsigned m_iJobBegin; unsigned m_iJobEnd; // required to start jobs JobParams m_aJobParams[kMaxBatchSize]; // keep track of job results (statically allocated) static const unsigned kMaxHistorySize = kSyncOutput ? 2 * kMaxBatchSize // more on this later : kMaxBatchSize; typedef RingBuffer<JobResults, kMaxHistorySize> History; History m_history; // set up keys for new batch // (number of keys = batch size = jobs per batch) typedef std::function<unsigned (Key *)> NewBatchFunc; NewBatchFunc m_newBatchFunc; // set up input for job typedef std::function<void (Key, Input *)> SetUpInputFunc; SetUpInputFunc m_setUpInputFunc; // logic to be timesliced // (takes key and input, writes output) typedef std::function<void (Key, const Input &, Output *)> JobFunc; JobFunc m_jobFunc; public: void Init ( NewBatchFunc newBatchFunc, SetUpInputFunc setUpInputFunc, JobFunc jobFunc ) { m_newBatchFunc = newBatchFunc; m_setUpInputFunc = setUpInputFunc; m_jobFunc = jobFunc; Reset(); } void Reset() { m_batchSize = 0; m_iJobBegin = 0; m_iJobEnd = 0; } bool GetOutput(Key key, Output *pOutput) const { // iterate from newest history (last queued output) for (const JobResults &results : m_history.Reverse()) { if (key == results.m_key) { *pOutput = results.m_output; return true; } } return false; } void Update(unsigned maxJobsPerUpdate) { TryStartNewBatch(); StartJobs(maxJobsPerUpdate); FinishJobs(); } private: void TryStartNewBatch() { if (m_iJobBegin == m_batchSize) { // synchronous output saved on batch finish if (kSyncOutput) { for (unsigned i = 0; i < m_batchSize; ++i) { const JobParams ¶ms = m_aJobParams[i]; SaveResults(params); } } Reset(); Key aKey[kMaxBatchSize]; m_batchSize = m_newBatchFunc(aKey); for (unsigned i = 0; i < m_batchSize; ++i) { JobParams ¶ms = m_aJobParams[i]; params.m_key = aKey[i]; // synchronous input set up on new batch start if (kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } } } } void StartJobs(unsigned maxJobsPerUpdate) { unsigned numJobsStarted = 0; while (m_iJobEnd < m_batchSize && numJobsStarted < maxJobsPerUpdate) { JobParams ¶ms = m_aJobParams[m_iJobEnd]; // asynchronous input set up on job start if (!kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } m_jobFunc ( params.m_key, params.m_input, ¶ms.m_output ); ++m_iJobEnd; ++numJobsStarted; } } void FinishJobs() { while (m_iJobBegin < m_iJobEnd) { const JobParams ¶ms = m_aJobParams[m_iJobBegin++]; // asynchronous output saved on job finish if (!kSyncOutput) { SaveResults(params); } } } void SaveResults(const JobParams ¶ms) { JobResults results; results.m_key = params.m_key; results.m_output = params.m_output; if (m_history.IsFull()) { m_history.Dequeue(); } m_history.Enqueue(results); } };

If your game engine allows multi-threading, we can go one step further by offloading jobs to threads. Starting a job now creates a thread that runs the timesliced logic, and finishing a job now waits for the thread to finish. We need to use read/write locks to make sure the timeslicer plays nicely with the rest of game logic. Required changes to code are highlighted below.

class Timeslicer { // ...unchanged code omitted RwLock m_lock; struct JobParams { std::thread m_thread; Key m_key; Input m_input; Output m_output; }; bool GetOutput(Key key, Output *pOutput) const { ReadAutoLock readLock(m_lock); // iterate from newest history (last queued output) for (const JobResults &results : m_history.Reverse()) { if (key == results.m_key) { *pOutput = results.m_output; return true; } } return false; } void TryStartNewBatch() { WriteAutoLock writeLock(m_lock); if (m_iJobBegin == m_batchSize) { // synchronous output saved on batch finish if (kSyncOutput) { for (unsigned i = 0; i < m_batchSize; ++i) { const JobParams ¶ms = m_aJobParams[i]; SaveResults(params); } } Reset(); Key aKey[kMaxBatchSize]; m_batchSize = m_newBatchFunc(aKey); for (unsigned i = 0; i < m_batchSize; ++i) { JobParams ¶ms = m_aJobParams[i]; params.m_key = aKey[i]; // synchronous input set up on new batch start if (kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } } } } void StartJobs(unsigned maxJobsPerUpdate) { WriteAutoLock writeLock(m_lock); unsigned numJobsStarted = 0; while (m_iJobEnd < m_batchSize && numJobsStarted < maxJobsPerUpdate) { JobParams ¶ms = m_aJobParams[m_iJobEnd]; // asynchronous input set up on job start if (!kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } params.m_thread = std::thread([¶ms]()->void { m_jobFunc ( params.m_key, params.m_input, ¶ms.m_output ); }); ++m_iJobEnd; ++numJobsStarted; } } void FinishJobs() { WriteAutoLock writeLock(m_lock); while (m_iJobBegin < m_iJobEnd) { JobParams ¶ms = m_aJobParams[m_iJobBegin++]; params.m_thread.join(); // asynchronous output saved on job finish if (!kSyncOutput) { SaveResults(params); } } } };

If your game can afford to have one more frame of latency and you don’t want the timeslicer squatting a thread, you can tweak the update function a bit, where jobs are started at the end of update in the current frame, and are finished at the beginning of update in the next frame.

void TimeSlicer::Update(unsigned maxJobsPerUpdate) { FinishJobs(); TryStartNewBatch(); StartJobs(maxJobsPerUpdate); }

That’s it! We’ve seen how timeslicing batched algorithms can help with game performance, as well as the 4 combinations of input and output with different timing, each having its own use (well, maybe not the last one). We’ve also seen how the timeslicing logic can be further adapted to make use of threads.

I hope you find this useful.

]]>Also, this post is part 2 of a series (part 1) leading up to a geometric interpretation of Fourier transform and spherical harmonics.

Drawing analogy from vector projection, we have seen what it means to “project” a curve onto another in the previous post. This time, we’ll see how to find a the closest vector on a plane via vector projection, and then we’ll see how it translates to finding the best approximation of a curve via curve “projection”. This handy analogy can help us take another step closer to a geometric interpretation of Fourier transform and spherical harmonics later.

Given vectors , , and , the closest vector on the plane formed (or “spanned” in linear algebra jargon) by and is the projection of onto the plane. This projection, denoted , is a combination of scaled and , in the form of , that has the least error from .

The error is measured by the magnitude of the difference vector:

As pointed out in the previous post, minimizing this error is essentially equivalent to minimizing the root mean square error (RMSE):

This is what the relationship of , , , and looks like visually:

The projection of onto the plane spanned by and , is the vector on the plan that has the least error from , and the difference vector is orthogonal to the plane.

So how do we compute ? In the previous post we’ve seen how to project a vector onto another, so would computing be as simple as projecting onto , and then project the result again onto ? Not really. Here’s why:

As you can see in the figure above, isn’t parallel to nor . Projecting onto would give you a vector that is parallel to , and a subsequent projection onto would leave you with a result that is parallel to , which is definitely not .

One way to do it is to calculate a vector orthogonal to the plane, i.e. a plane normal , by taking the cross product of the two vectors that span the plane: . Then, take out the part in that is parallel to by subtracting the projection of onto from . What is left of is the part of that is parallel to the plane, i.e. the projection:

But, I want to talk about another way of performing the projection, which is easier to translate to curves later. and are not necessarily orthogonal to each other. Let’s find two orthogonal vectors that lie on the plane spanned by and . Then, we split into two parts, one parallel to one vector and one parallel to the other vector. Finally, we combine these two parts together to obtain a vector that is essentially the part of that is parallel to the plane.

As a simple illustration, if the plane is the X-Z plane, then the obvious two orthogonal vectors of choice would be and . To project a vector onto the X-Z plane, we split it into a part that is parallel to , which is , and a part that is parallel to , which is . Combining those two parts together would give us . This makes sense, because projecting a vector onto the X-Z plane is just as simple as dropping the Y component.

Now, given two arbitrary vectors and that span a plane, we can generate two orthogonal vectors, denoted and , by using a method called the Gram-Schmidt process. The first vector would simply be the . To compute the second vector , we take away from its part that is parallel to ; what’s left of is orthogonal to :

To compute , we combine the parts of that are parallel to and , respectively:

The Gram-Schmidt process is actually more general than described above. It can apply to higher dimensions. Given vectors, denoted to , in an -dimensional space (), and if the vectors are linearly independent, i.e. they span an -dimensional subspace, then we can generate vectors that are orthogonal to each other, denoted through , spanning the same subspace, using the Gram-Schmidt process.

The first vector would simply be . To compute the second vector , we take away from its part that is parallel to . To compute the third vector , we take away from its part that is parallel to **all previously generated orthogonal vectors**, and . Repeat this process until we have reached and produced :

Projecting an -dimensional vector onto this -dimensional subspace would involve combining the parts of the vector parallel to each of the orthogonal vectors. In our example above that involves 3D vectors, and . In higher dimensions, no simple 3D cross products can save you there.

Now we are done with vectors. Let’s take a look at curves!

Let’s say our interval of interest is . Given a 3rd-order polynomial curve , what’s the best approximation using a 2st-order polynomial curve, or a 1th-order polynomial curve (flat line)? How about simply dropping the higher-order terms, so we get and ? Here’s what they look like:

At first glance, I’d say and are not what we want. We can definitely find a parabolic curve and a line that approximate better. Look at just how far apart and are from at . Clearly, and are not the 2nd-order and 1st-order polynomial curves that have the least RMSEs from . Simply dropping higher-order terms turns out to be a naive approach. The right way to do it is just like what we did with vectors: projection.

In the vector example above, we were operating in the 3D geometric space. Now we are working with a more abstract 3rd-order polynomial space where lives in. The lower-order polynomial curve that has the least RMSE from is the projection of into that lower-order polynomial space. Let’s start with finding the 2nd-order polynomial curve that has the least RMSE from .

The 2nd-order polynomial subspace is 3-dimensional, since a 2nd-order polynomial curve has the form . Let’s first find 3 curves that span the subspace. An easy pick would be , , and . Now we need to use them to generate a set of orthogonal curves, , , and using the Gram-Schmidt process:

If you forgot how to “project” a curve onto another, please refer to the previous post.

Here are the results:

You can say that , , and are a set of orthogonal axes spanning the 2nd-order polynomial subspace. Now we split into three orthogonal parts by projecting it onto , , and :

Here’s what , , and look like alongside :

and might not look like they are close to , but they are the closest curves you can get along the axes and that have the least RMSEs from .

Now, we can combine the three orthogonal parts of to form the 2nd-order polynomial curve that is the best approximation of :

This looks way better than the result of simply dropping the 3rd-order term, as shown in the figure above.

Since the three parts are already orthogonal, we can actually obtain the 1st-order polynomial curve that best approximates by simply dropping from :

Also looking good, compared to simply dropping the 3rd-order and 2nd-order terms.

That’s it. In this post, we’ve seen how to generate a set of orthogonal curves from a set of curves spanning a lower-dimensional subspace of curves, and use the orthogonal curves to find the best approximation of a curve via curve “projection”.

We now have all the tools we need to move onto Fourier transform and spherical harmonics in the next post. Finally, something game-related!

]]>Also, this post is part 1 of a series (part 2) leading up to a geometric interpretation of Fourier transform and spherical harmonics.

Fourier transform and spherical harmonics are mathematical tools that can be used to represent a function as a combination of periodic functions (functions that repeat themselves, like sine waves) of different frequencies. You can approximate a complex function by using a limited number of periodic functions at certain frequencies. Fourier transform is often used in audio processing to post-process signals as combination of sine waves of different frequencies, instead of single streams of sound waves. Spherical harmonics can be used to approximate baked ambient lights in game levels.

We’ll revisit these tools in later posts, so it’s okay if you’re still not clear how they can be of use at this point. First, let’s start somewhere more basic.

If you have two vectors and , **projecting onto ** means stripping out part of that is orthogonal to , so the result is the part of that is parallel to . Here is a figure I borrowed from Wikipedia:

The **dot product** of and is a scalar equal to the product of magnitude of both vectors and the cosine of the angle (, using the figure above) between the vectors:

Another way of calculating the dot product is adding together the component-wise products. If and , then:

A follow-up to the alternate formula above is the formula for vector magnitude. The magnitude of a vector is the square root of the dot product of the vector with itself:

The geometric meaning of the dot product is the magnitude of the projection of onto **scaled** by . Dot product is commutative, i.e. , which means that is also equal to the magnitude of the projection of onto scaled by .

So if you want to get the magnitude of the projection of onto , you need to divide the dot product by the magnitude of :

To get the actual projected vector of onto , multiply the magnitude with the unit vector in the direction of , denoted by :

One important property of dot product is: if it’s positive, the two vectors point in roughly the same direction (); if it’s zero, the vectors are orthogonal (); if it’s negative, the vectors point away from each other ().

For the dot product of two unit vectors, like , it’s just . If it’s 1, then the two vectors point in exactly the same direction (); if it’s -1, then the two vectors point in exactly opposite directions (). So, in order to measure how close the directions two vectors point in, we can normalize both vectors and take their dot product.

Let’s say we have three vectors: , , and . If we want to determine which of and points in a direction closer to where points, we can just compare the dot products of their normalized versions, and . Whichever vector’s normalized version has a larger dot product with points in a direction closer to that of .

A metric often used to measure the difference between two data objects is the root mean square error (RMSE) which is the square root of the average of component-wise errors. For vectors, that means:

It kind of makes sense, because it is exactly the magnitude of the vector that is the difference between and scaled by :

It’s also the square root of the dot product of the difference vector with it self scaled by :

Here’s an important property of projection:

The projection of a vector onto another vector is the vector parallel to that has the **minimal RMSE** with respect to . In other words, gives you **the best scaled version of to approximate **.

Also note that if is larger than , it means has a smaller RMSE than with respect to ; thus, points in a direction closer to that o than does.

Now we’re finished with vectors. It’s time to move onto curves.

Let’s consider these three curves:

When working with curves, as opposed to vectors, we need to additionally specify an interval of of interest. For simplicity, we will consider for the rest of this post.

Below is a figure showing what they look like side-by-side within our interval of interest:

Just like vectors, “projecting” a curve onto another curve gives you the best scaled version of to approximate , and the “projection” has minimal RMSE with respect to . To compute the RMSE of curves, we need to first figure out how to compute the “dot product” of two curves.

Recall that the dot product of vectors is equal to the sum of component-wise products:

Mirroring that, let’s sum up the products of samples of curves at regular intervals, and we normalize the sum by dividing it with the number of samples, so we don’t get drastically different results due to different number of samples. If we take 10 samples between to compute the dot product of and , we get:

The more samples we use, the more accuracy we get. What if we take an infinite number of samples so we get the most accurate result possible?

This basically turns into an integral:

So there we have it, one common definition of the “dot product” of two curves:

**The integral of the product of two curves over the interval of interest**.

Copying the formula from vectors, the RMSE between two curves and is:

In integral form, it becomes:

The “mean” part of the error is omitted since it’s a division by 1, the length of our interval of interest.

To find out which one of the “normalized” version of and has less RMSE with respect to the normalized version of , we take the dot products of the normalized versions of the curves:

The dot product of and is larger than that of and . That means has a lower RMSE than with respect to .

Drawing analogy from vectors, is conceptually “pointing in a direction” closer to that of than does.

Now let’s try finding the best scaled version of that has minimal RMSE with respect to , by computing the projection of onto :

And this is what , , and look like side-by-side:

The projected curve is a scaled-up version of that is a better approximation of than itself; it is the best scaled version of that has the least RMSE with respect to .

Now that you know how to “project” a curve onto another, we will see how to approximate a curve with multiple simpler curves while maintaining minimal error.

]]>本文屬於My Career系列文

Here is the original English post.

本文之英文原文在此

註：為方便複製至PTT，本文排版採BBS格式，不習慣者請見諒

Uncharted 4已經發售，終於可以分享我負責開發的部分了

我主要是負責單人模式的夥伴AI、多人模式的戰友AI、還有一些遊戲邏輯

沒有收錄到最終遊戲的部分和一些瑣碎的細工我就略過不提

** = 崗位系統 = **

在本文開始前，我想要先談談我們用來指派NPC移動位置的崗位系統

這個系統的核心邏輯不是我負責的，我寫的是使用這個系統的客戶端程式

崗位是可行走空間中的離散位置

大部分是用工具自動生成的，也有一些是設計師手動擺置的

基於不同需求，我們設計不同的崗位平分系統

(e.g. 潛行崗位、戰鬥崗位)

然後我們選擇評分最高的崗位，指派NPC移動過去

** = 夥伴跟隨 = **

夥伴跟隨系統是繼承自The Last of Us

基本概念就是，夥伴在玩家周圍找個跟隨點

這些可能的跟隨點從玩家位置扇狀分開

並且要滿足以下的路徑線段淨空條件:

– 玩家到跟隨點

– 跟隨點到前方投射點

– 前方投設點到玩家

攀爬是Uncharted 4的新功能，這是The Last of Us 沒有的

為了與現有的跟隨系統整合，我利用攀爬崗位讓夥伴可以跟著玩家一起攀爬

這個功能比我想像中的還要難搞

單純根據玩家的攀爬狀態來切換夥伴的攀爬狀態，結果不甚理想

只要玩家快速在攀爬與非攀爬的狀態之間切換，夥伴就會在兩個狀態間快速跳換

於是我加入了遲滯現象(hysteresis)

只有在玩家切換了攀爬狀態，並且保持此狀態移動一定距離之後，夥伴才跟進

廣泛來說，遲滯現象是個解決行為跳換的好方法

** = 夥伴帶領 = **

遊戲中的某些特定場景，我們要讓夥伴帶領玩家前進

我把The Last of Us的帶領系統移植過來

設計師使用spline曲線在關卡中標記他們想讓夥伴帶領玩家的大致路線

如果有多個帶領路線，設計師則會用腳本語言切換主要的帶領路線

玩家的位置投射到spline曲線上，再往前延伸設定為帶領參考點

當帶領參考點超越被標記為等待點的spline曲線控制點，夥伴會前往下個等待點

如果玩家走回頭路

夥伴只有在帶領參考點離此次推進至最遠的等待點一段距離，才會回頭

這也是利用遲滯現象來避免行為跳換

我也把動態移動速度調整的功能整合進帶領系統

根據夥伴和玩家之間的距離，一些”速度平面”沿著spline曲線放置

夥伴有三種移動模式: 走路、跑步、衝刺

根據玩家撞到的速度平面，夥伴會選擇不同的移動模式

另外，夥伴的行進動畫速度也會基於玩家距離做微調

目的是避免切換移動模式的時後，有太突然的移動速度變化

** = 夥伴掩體共用 = **

在The Last of Us中，玩家和夥伴可以在各不離開掩體的狀況下重疊

我們稱這個為掩體共用

The Last of Us中的Joel伸手跨過Ellie和Tess按在掩體上

看起來很自然，因為夥伴的身型都比玩家嬌小

但是同樣的動作就不適合身型差不多的Nate、Sam、Sully、和Elena

而且Uncharted 4的遊戲節奏較快

讓Nate伸手去按掩體只會讓動作流暢性打折扣

所以我們決定就單純讓夥伴靠緊掩體，玩家稍微繞彎避開伙伴

我用的邏輯很簡單

如果玩家位置往移動方向投射的點，落在夥伴掩體周圍的一個方框內

夥伴就會取消目前的掩體行為，並且快速靠緊掩體

** = 救星戰友 = **

我負責多人模式的戰友(sidekicks)，而救星戰友是其中最特別的

單人模式中的NPC，沒有一個人的行為跟救星戰友一樣

他們會復甦被擊倒的同伴，也會複製玩家的掩蔽行為

救星戰友會嘗試複製玩家的掩蔽行為，並且盡量待在離玩家很近的地方

所以當玩家被擊倒的時候，他們就可以迅速跑過來復甦

如果玩家有裝備救星戰友的復甦包額外功能

他們會在採取復甦行動之前，朝被擊倒的復甦目標丟復甦包

復甦包丟擲基本上就是延用手榴彈的拋物線淨空測試和擲彈動作

只是我把手榴彈換成復甦包而已

** = 隱蔽草叢 = **

在隱蔽草叢中蹲行也是Uncharted 4才有的新功能

要實作這個功能，我們需要某種能夠標記場景的手段

遊戲邏輯才可以判斷玩家是否身處隱蔽草叢中

我們一開始是讓美術人員在Maya中標記背景模型的表面

但美術人員和設計師之間的溝通時間太長，很難頻繁改進關卡

於是我們決定用另外一種方法標記隱蔽草叢

我在場景編輯器中的nav mesh增加了隱密草叢的額外tag

讓設計師可以直接在編輯器中精準標記隱蔽草叢

有了這個額外的標記

我們也可以用這個資訊來為隱蔽崗位評分

** = 感知 = **

Uncharted 4沒有像The Last of Us有聆聽模式

所以我們必須要找另外一種方法，讓玩家有辦法得知附近的敵人威脅

好讓玩家不會在未知的敵對環境中產生迷失感

我利用敵人的感知資料，加入了威脅標示

當敵人開始注意(白色)、起疑(黃色)、和發現(橘色)玩家

這些標示會適時地提醒玩家

另外，我在威脅標示開始累積的同時播放背景雜音，以製造張力

當玩家被發現的時候，則播放大聲的提示音效

這些音效的安排和做用跟The Last of Us類似

** = 調查 = **

這是在我們送廠壓片前，我負責的最後一個功能

我平常在Naughty Dog是不參加正式會議的

不過在送廠壓片的前幾個月，我們每週至少開一次會

由Bruce Straley或Neil Druckmann主持，專注在遊戲的AI部分

幾乎每次開完會之後，調查系統都有需要更動的地方

前前後後總共經歷了好幾次大改

會讓敵人起疑的因素有兩種: 玩家和屍體

當敵人起疑了(起疑者)，他會抓最近的同伴來一起調查

離起疑點較近的人會成為調查者，另外一個人則是看守者

起疑者可能會視調查者，也有可能是看守者

我們總共有兩組不同的對話，適用於兩種不同的情況

(“那邊有異狀，我去看看” vs “那邊有異狀，你去看看”)

為了讓雙人調查看起來更自然

我使用了時域錯位的技巧，讓兩人的行動和威脅標示時間點錯開

否則兩個人的行為完全同步，看起來非常機械式、很不自然

如果調查者發現了屍體，他會通知全部的同伴開始搜索玩家

屍體也會被暫時標示，以讓玩家知道敵人為什麼進入警戒

在某些難度下，短時間內連續觸發調查，會讓敵人的感應力變敏銳

他們會更容易發現玩家，即使玩家躲在隱蔽草叢中也一樣

慘烈模式下，敵人永遠處於敏銳狀態

** = 對話動作 = **

這也是我負責的最後幾個功能之一

對話動作系統負責操控角色，在對話的時候做出一些小動作

像是轉頭看其他人和肢體動作

之前在The Last of Us

開發人員花好幾個月的時間，把整個遊戲所有的對話腳本手動加註上對話動作

我們可不想再做一次這種苦工

在這個開發階段，已經有部分對話腳本被手動加註好對話動作了

我們需要一個泛用型系統，可以幫沒有加註對話動作的腳本自動產生對話動作

而我就是負責製作這個對話動作系統

動畫師可以調整參數，改變轉頭速度、轉頭角度、注視時間、反覆時間等

** = 維持吉普車動量 = **

開發初期遇到的問題之一，就是馬達加斯加的吉普車駕駛關卡

當玩家開車撞到牆或者敵人的載具，玩家的車就會旋轉失速以致脫離車隊而關卡失敗

我使用的解決方法是，當玩家的車撞到牆或者敵方載具的時候

短暫地限制吉普車的最高角速度和線性速度的方向變量

這個簡單的方法相當有效，從此玩家就比較不容易旋轉失速而導致關卡失敗了

** = 載具死亡 = **

可駕駛的載具是首次在Uncharted 4登場

在這之前，所有的載具都是NPC駕駛、沿著固定軌道行進

我負責載具死亡的部分

摧毀載具有幾種方式:

解決駕駛、開槍射車、開車撞飛敵方機車、開車撞敵方吉普車導致旋轉失速

基於不同的死法，載具死亡系統會選擇載具和乘客的死亡動畫來播放

死亡動畫會漸漸混入物理引擎控制的ragdoll系統

所以死亡動畫會不著痕跡地轉換成物理模擬的翻車

當玩家開吉普車撞飛敵方機車的時候

我使用機車在XZ平面上投影的bounding box和碰撞點

來判斷要使用四個撞飛死亡動畫中的哪一個

至於衝撞使得敵方吉普車旋轉失速

我是拿敵方吉普車與預設行進方向之間的旋轉量差來比較旋轉失速判定閾值

載具播放死亡動畫的時候，有機會穿透牆壁

我使用球體投射，從預設位置投射向載具實際位置

如果投射結果是與牆壁碰撞，則把載具稍微往牆壁的法向量移動

不一次完全修正誤差，是為了避免太過劇烈的位移

我另外實作了一種特別的載具死亡類型，叫做載具死亡提示

這些死亡提示是動畫師和設計師在場景中擺置好的客製化死亡動畫

每個死亡提示在載具行進軌道上都有個進入範圍

當一個載具在死亡提示進入範圍中死亡，則會開始播放死亡提示的特殊死亡動畫

之所以開發這功能，一開始是為了2015年E3展的超帥氣吉普車死亡動畫

** = 混色用的貝爾矩陣 = **

我們想要消除攝影機切入看穿物體的瑕疵，特別是遊戲中的各種植物

於是我們決定要讓靠近攝影機的像素淡出

使用半透明像素並不是個好主意，因為非常消耗效能

我們使用的技巧，是所謂的混色(dithering)

https://en.wikipedia.org/wiki/Dither

使用混色技巧搭配貝爾矩陣(Bayer matrix)

利用一個預先決定的點陣模板來決定哪些像素可以捨棄而不渲染

https://en.wikipedia.org/wiki/Ordered_dithering

結果就是產生半透明的錯覺

一開始使用的貝爾矩陣是個8×8矩陣，取自上述的Wikipedia頁面

我認為這個矩陣太小，會造成不美觀的帶狀瑕疵

我想要使用16×16的貝爾矩陣，但是網路上都找不到相關資料

於是我試著用逆向工程找出8×8貝爾矩陣的遞迴特性

光用目測法，我想我應該可以直接解出16×16貝爾矩陣

但是我想要讓過程更有趣一點

我寫了一個工具，可以生成二的任何次方大小的貝爾矩陣

換到16×16貝爾具陣之後，可以明顯看到帶狀瑕疵的改善

** = 爆炸聲延遲 = **

這個部份我其實沒有什麼大貢獻，但是我還是覺得值得一提

在2015年E3展示中，Nate和Sully同時接收到高塔傳過來的爆炸聲和爆炸畫面

這是不合理的，因為高塔距離非常遠，爆炸聲應該會晚一點才被接收到

我在開展前幾週指出這點，美術團隊後來就在爆炸聲之前加上一小段延遲了

** = 繁體中文在地化 = **

直到送廠壓片前幾週我才開始在遊戲中改用繁體中文字幕，而我找到了許多錯誤

大部分的錯誤都是英文直譯中文，而變成四不像的用語

我認為我沒有足夠的時間可以單槍匹馬全破一次遊戲又同時抓出翻譯錯誤

於是我請幾個QA部門的人分章節、用繁體中文模式遊玩

然後我陸續瀏覽他們的遊玩錄製影片

結果這個方法相當有效率

我成功地把我找到的翻譯錯誤建檔，而在地化小組也有足夠的時間修正翻譯

** = 結束 = **

以上就是我對Uncharted 4開發上值得一提的貢獻

希望大家讀得愉快

This post is part of My Career Series.

Here is the Chinese translation of this post.

本文之中文翻譯在此

Now that Uncharted 4 is released, I am able to talk about what I worked on for the project. I mostly worked on AI for single-player buddies and multiplayer sidekicks, as well as some gameplay logic. I’m leaving out things that never went in to the final game and some minor things that are too verbose to elaborate on. So here it goes:

Before I start, I’d like to mention the post system we used for NPCs. I did not work on the core logic of the system; I helped writing some client code that makes use of this system.

Posts are discrete positions within navigable space, mostly generated from tools and some hand-placed by designers. Based on our needs, we created various post selectors that rate posts differently (e.g. stealth post selector, combat post selector), and we pick the highest-rated post to tell an NPC to go to.

The buddy follow system was derived from The Last of Us.

The basic idea is that buddies pick positions around the player to follow. These potential positions are fanned out from the player, and must satisfy the following linear path clearance tests: player to position, position to a forward-projected position, forward-projected position to the player.

Climbing is something present in Uncharted 4 that is not in The Last of Us. To incorporate climbing into the follow system, we added the climb follow post selector that picks climb posts for buddies to move to when the player is climbing.

It turned out to be trickier than we thought. Simply telling buddies to use regular follow logic when the player is not climbing, and telling them to use climb posts when the player is climbing, is not enough. If the player quickly switch between climbing and non-climbing states, buddies would oscillate pretty badly between the two states. So we added some hysteresis, where the buddies only switch states when the player has switched states and moved far enough while maintaining in that state. In general, hysteresis is a good idea to avoid behavioral flickering.

In some scenarios in the game, we wanted buddies to lead the way for the player. The lead system is ported over from The Last of Us and updated, where designers used splines to mark down the general paths we wanted buddies to follow while leading the player.

In case of multiple lead paths through a level, designers would place multiple splines and turned them on and off via script.

The player’s position is projected onto the spline, and a lead reference point is placed ahead by a distance adjustable by designers. When this lead reference point passes a spline control point marked as a wait point, the buddy would go to the next wait point. If the player backtracks, the buddy would only backtrack when the lead reference point gets too far away from the furthest wait point passed during last advancement. This, again, is hysteresis added to avoid behavioral flickering.

We also incorporated dynamic movement speed into the lead system. “Speed planes” are placed along the spline, based on the distance between the buddy and the player along the spline. There are three motion types NPCs can move in: walk, run, and sprint. Depending on which speed plane the player hits, the buddy picks an appropriate motion type to maintain distance away from the player. Designers can turn on and off speed planes as they see fit. Also, the buddy’s locomotion animation speed is slightly scaled up or down based on the player’s distance to minimize abrupt movement speed change when switching motion types.

In The Last of Us, the player is able to move past a buddy while both remain in cover. This is called cover share.

In The Last of Us, it makes sense for Joel to reach out to the cover wall over Ellie and Tess, who have smaller profile than Joel. But we thought that it wouldn’t look as good for Nate, Sam, Sully, and Elena, as they all have similar profiles. Plus, Uncharted 4 is much faster-paced, and having Nate reach out his arms while moving in cover would break the fluidity of the movement. So instead, we decided to simply make buddies hunker against the cover wall and have Nate steer slightly around them.

The logic we used is very simple. If the projected player position based on velocity lands within a rectangular boundary around the buddy’s cover post, the buddy aborts current in-cover behavior and quickly hunkers against the cover wall.

Medic sidekicks in multiplayer required a whole new behavior that is not present in single-player: reviving downed allies and mirroring the player’s cover behaviors.

Medics try to mimic the player’s cover behavior, and stay as close to the player as possible, so when the player is downed, they are close by to revive the player. If a nearby ally is downed, they would also revive the ally, given that the player is not already downed. If the player is equipped with the RevivePak mod for medics, they would try to throw RevivePaks at revive targets before running to the targets for revival (multiple active revivals reduce revival time); throwing RevivePaks reuses the grenade logic for trajectory clearance test and animation playback, except that grenades were swapped out with RevivePaks.

Crouch-moving in stealth grass is also something new in Uncharted 4. For it to work, we need to somehow mark the environment, so that the player gameplay logic knows whether the player is in stealth grass. Originally, we thought about making the background artists responsible of marking collision surfaces as stealth grass in Maya, but found out that necessary communication between artists and designers made iteration time too long. So we arrived at a different approach to mark down stealth grass regions. An extra stealth grass tag is added for designers in the editor, so they could mark the nav polys that they’d like the player to treat as stealth grass, with high precision. With this extra information, we can also rate stealth posts based on whether they are in stealth grass or not. This is useful for buddies moving with the player in stealth.

Since we don’t have listen mode in Uncharted 4 like The Last of Us, we needed to do something to make the player aware of imminent threats, so the player doesn’t feel overwhelmed by unknown enemy locations. Using the enemy perception data, we added the colored threat indicators that inform the player when an enemy is about to notice him/her as a distraction (white), to perceive a distraction (yellow), and to acquire full awareness (orange). We also made the threat indicator raise a buzzing background noise to build up tension and set off a loud stinger when an enemy becomes fully aware of the player, similar to The Last of Us.

This is the last major gameplay feature I took part in on before going gold. I don’t usually go to formal meetings at Naughty Dog, but for the last few months before gold, we had a at least one meeting per week driven by Bruce Straley or Neil Druckmann, focusing on the AI aspect of the game. Almost after every one of these meetings, there was something to be changed and iterated for the investigation system. We went through many iterations before arriving at what we shipped with the final game.

There are two things that create distractions and would cause enemies to investigate: player presence and dead bodies. When an enemy registers a distraction (distraction spotter), he would try to get a nearby ally to investigate with him as a pair. The closer one to the distraction becomes the investigator, and the other becomes the watcher. The distraction spotter can become an investigator or a watcher, and we set up different dialog sets for both scenarios (“There’s something over there. I’ll check it out.” versus “There’s something over there. You go check it out.”).

In order to make the start and end of investigation look more natural, we staggered the timing of enemy movement and the fading of threat indicators, so the investigation pair don’t perform the exact same action at the same time in a mechanical fashion.

If the distraction is a dead body, the investigator would be alerted of player presence and tell everyone else to start searching for the player, irreversibly leaving ambient/unaware state. The dead body discovered would also be highlighted, so the player gets a chance to know what gave him/her away.

Under certain difficulties, consecutive investigations would make enemies investigate more aggressively, having a better chance of spotting the player hidden in stealth grass. In crushing difficulty, enemies always investigate aggressively.

This is also among the last few things I helped out with for this project.

Dialog looks refers to the logic that makes characters react to conversations, such as looking at the other people and hand gestures. Previously in The Last of Us, people spent months annotating all in-game scripted dialogs with looks and gestures by hand. This was something we didn’t want to do again. We had some scripted dialogs that are already annotated by hand, but we needed a default system that handles dialogs that are not annotated. The animators are given parameters to adjust the head turn speed, max head turn angle, look duration, cool down time, etc.

One of the problems we had early on regarding the jeep driving section in the Madagascar city level, is that the player’s jeep can easily spin out and lose momentum after hitting a wall or an enemy vehicle, throwing the player far behind the convoy and failing the level.

My solution was to temporarily cap the angular velocity and change of linear velocity direction upon impact against walls and enemy vehicles. This easy solution turns out pretty effective, making it much harder for players to fail the level due to spin-outs.

Driveable vehicles are first introduced in Uncharted 4. Previously, only NPCs can drive vehicles, and those vehicles are constrained to spline rails. I helped handling vehicle deaths.

There are multiple ways to kill enemy vehicles: kill the driver, shoot the vehicle enough times, bump into an enemy bike with your jeep, and ram your jeep into an enemy jeep to cause a spin-out. Based on various causes of death, a death animation is picked to play for the dead vehicle and all its passengers. The animation blends into physics-controlled ragdolls, so the death animation smoothly transitions into physically simulated wreckage.

For bumped deaths of enemy bikes, we used the bike’s bounding box on the XZ plane and the contact position to determine which one of the four directional bump death animations to play.

As for jeep spin-outs, the jeep’s rotational deviation from desired driving direction is tested against a spin-out threshold.

When playing death animations, there’s a chance that the dead vehicle can penetrate walls. A sphere cast is used, from the vehicle’s ideal position along the rail if it weren’t dead, to where the vehicle’s body actually is. If a contact is generated from the sphere cast, the vehicle is shifted in the direction of the contact normal by a fraction of penetration amount, so the de-penetration happens gradually across multiple frames, avoiding positional pops.

We made a special type of vehicle death, called vehicle death hint. They are context-sensitive death animations that interact with environments. Animators and designers place these hints along the spline rail, and specify entry windows on the splines. If a vehicle is killed within an entry window, it starts playing the corresponding special death animation. This feature started off as a tool to implement the specific epic jeep kill in the 2015 E3 demo.

We wanted to eliminate geometry clipping the camera when the camera gets too close to environmental objects, mostly foliage. So we decided to fade out pixels in pixel shaders based on how close the pixels are to the camera. Using transparency was not an option, because transparency is not cheap, and there’s just too much foliage. Instead, we went with dithering, combining a pixel’s distance from the camera and a patterned Bayer matrix, some portion of the pixels are fully discarded, creating an illusion of transparency.

Our original Bayer matrix was an 8×8 matrix shown on this Wikipedia page. I thought it was too small and resulted in banding artifacts. I wanted to use a 16×16 Bayer matrix, but it was no where to be found on the internet. So I tried to reverse engineer the pattern of the 8×8 Bayer matrix and noticed a recursive pattern. I would have been able to just use pure inspection to write out a 16×16 matrix by hand, but I wanted to have more fun and wrote a tool that can generate Bayer matrices sized any powers of 2.

After switching to the 16×16 Bayer matrix, there was a noticeable improvement on banding artifacts.

This is a really minor contribution, but I’d still like to mention it. A couple weeks before the 2015 E3 demo, I pointed out that the tower explosion was seen and heard simultaneously and that didn’t make sense. Nate and Sully are very far away from the tower, they should have seen and explosion first and then heard it shortly after. The art team added a slight delay to the explosion sound into the final demo.

I didn’t switch to Traditional Chinese text and subtitles until two weeks before we were locking down for gold, and I found some translation errors. Most of the errors were literal translations from English to Traditional Chinese and just did’t work in the contexts. I did not think I would have time to play through the entire game myself and look out for translation errors simultaneously. So I asked multiple people from QA to play through different chapters of the game in Traditional Chinese, and I went over the recorded gameplay videos as they became available. This proved pretty efficient; I managed to log all the translation errors I found, and the localization team was able to correct them before the deadline.

These are pretty much the things I worked on for Uncharted 4 that are worth mentioning. I hope you enjoyed reading it.

]]>Here is the original English post.

本文之英文原文在此

註1：此文撰寫時間為Uncharted 4送廠壓片前

註2：為方便複製至PTT，本文排版採BBS格式，不習慣者請見諒

我開始參與Uncharted 4的製作，已經是快兩年前的事了

而現在，離發售日只剩下不到兩個月

這是我第一個以全職遊戲程式設計師身分參與開發的遊戲

(之前參與製作Planetary Annihilation時我仍只是個暑期實習生)

回頭一看才發現，自從十幾年前我夢想開發遊戲，真的走了很遠

我想要藉這個機會，把這一趟旅程跟大家分享，也為自己做個筆記

** = 當初我想要在電玩遊戲專賣店工作 = **

第一次接觸遊戲的時候是幼稚園，我爸買了一台元祖Game Boy給我

一開始我只有兩個遊戲：Super Mario Land和Tetris(俄羅斯方塊)

從這個時候開始，我就迷上電玩了

小學二年級的時候，我的兩位堂弟有了台超任

我常常跑過去找他們玩超任，但是只有兩個手把，所以我們必須輪流玩

經過一番拜託，我的父母也買了一台超任給我

我最喜歡的遊戲是Super Bomberman(超級轟炸超人)系列

甚至還回頭去蒐集Game Boy上所有的轟炸超人遊戲

小學四年級時，有位朋友把他的N64和Super Mario 64帶來我家玩

這是我第一次親手玩到3D遊戲，真的令我大開眼界

我們做了一筆”交易”，我比他稍微多買些遊戲

然後我們輪流擁有那台N64和遊戲，每幾個月交換一次

在這段期間，當有人問我未來想要做什麼

我會回答 “我想要在電玩遊戲專賣店工作，因為這樣就可以一整天都打電動”

這個想法，在我五年級的時候完全改變

當時的導師帶了一台PS1到學校，讓我們在放學等家長接送的時候可以玩

有天，導師在全班面前展現他的高超遊戲技巧，他說了:

“很會打電動的人很厲害，但你知道什麼人更厲害? 做遊戲的人更厲害!”

從這個時候起，我便決定我未來要製作遊戲

** = 我的程式設計史前時期 = **

當時，我完全不知道有什麼電玩遊戲開發的相關學習資源

我頂多就是用Macromedia Flash 3開發簡單的滑鼠互動動畫

這是我在學校的電腦課學到的

上國中的時候，我開始使用一款叫做TrueSpace的3D建模軟體

試著做一些簡單的人物模型，希望未來可以放到遊戲裡(最後當然沒成功)

我上高中後，加入了電子計算機研究社

希望可以藉此機會，學習遊戲開發的相關技術

上了第一堂社課，才發現我其實並不喜歡寫程式，比較喜歡美術

寫程式看起來好困難，而且又沒有美術那麼吸睛

所以，我與遊戲開發的程式領域脫節了一陣子

這段期間，我主要是在學習Photoshop和3ds Max

一位住在加拿大的朋友介紹了deviantART這個網站給我

我開始頻繁地產出畫作並上傳到deviantART上

這段期間，我也透過這個網站學會了許多口語式英文和網路用語

我注意到網站上有人上傳一些Flash遊戲

這重新點燃了我對使用Flash開發遊戲的興趣

我再次嘗試學習程式設計，一旦越過了一開始的學習陡坡，就沒那麼可怕了

同時，我開始使用Swift3D將3D模型輸出成Flash向量動畫格式，以開發遊戲

我完成度最高的作品是個兔子跳舞遊戲，叫做 “跳舞吧! R平方”

可惜的是，我並沒有完成它，而原始檔也遺失了

唯一剩下的就是當時上傳到deviantART的情人節桌布，包含四位主要角色

http://bit.ly/1RpBu8X

** = 我與Naughty Dog遊戲的第一次接觸 = **

有位朋友帶了從美國買的PS2來我家玩，也帶了Jak and Daxter這遊戲

台灣賣的PS2遊戲大部分是日本遊戲，這是我第一次看到美國的PS2遊戲

我在PS1時代只有稍微聽說過 “袋狼大進擊”

但不知道跟Jak and Daxter是同一個工作室製作的

過了不久，在高中二年級的時候，我禁不起同學午餐時間討論PS2的誘惑

也買了一台PS2

暑假期間，我們全家到美國加州去拜訪我父母的大學同學

我看到他們的小孩也有一台PS2，還有一堆我從來沒有看過的美國遊戲

出於好奇，我請他們帶我去當地的遊戲店逛逛，他們帶我去了一家GameStop

我看到架上有一款Jak X

封面上的 “綠頭髮男主角和一隻橘色像松鼠的跟班”

馬上就讓我想起當初看到的Jak and Daxter

我覺得這遊戲看起來很有趣，就買回去了

Jak X帶給我的電影般衝擊是前所未見的

此刻奠定了Naughty Dog在我心目中的地位

之後，我開始天馬行空胡思亂想，覺得要是我以後能在Naughty Dog工作就好了

我盯著Naughty Dog的官方網站

按下了那誘人的超連結: “想加入我們嗎?按這裡!”

然後跳出了一個問題: “你還是學生嗎?”

我選了 “是”，映入眼簾的是一條訊息:

“對不起，我們不收工讀生。但如果你畢業後想加入我們，上數學課就認真點!”

這是我人生的一大轉捩點，從此對數學課的態度完全改變

漸漸地，我開始覺得數學課其實很有趣、真的很有用，上課也更認真了

我努力學習了向量、矩陣、幾何、統計、排列組合與機率

在這裡有個有趣的小插曲:

我試著用ActionScript寫一個解二元一次聯立方程式的小程式

想要在數學作業上偷懶

結果，我沒有考慮到無解的狀況，讓程式處理了除以零的算式

於是得到NaN (not a number)的答案

當時我還以為電腦是在罵我嫩 (發音跟NaN相近)

高中的最後一個學期，國文老師出了一個作業

要我們寫一封給夢想中的公司(可虛構)的自我推薦信

我是班上唯一一個寫給外國公司的人，而那家公司，當然是Naughty Dog

當時還是有不少師長，會對從事遊戲開發皺眉

因為 “電玩遊戲會讓小孩子荒廢學業、帶壞小孩子”

不過，我的成績還算不錯，所以老師就沒有對我的作文多說什麼

接著，就要面對指考(大學聯考)了

**= 對Naughty Dog的遊戲上癮 = **

我的指考成績還算不錯，想要選的學校和科系都可以上

基於從事遊戲開發的決心，我告訴父母我決定要選擇資訊工程學系

不過我的爸爸給了我不同的建議，他說我應該要選電機工程學系

理由是: 他的本職是小兒科醫師，可是大學七年學的大都是其他科的知識

等到成為實習醫生之後，才開始鑽研小兒科

結果，現在他除了小兒科以外的病症，也可以處理各種的疑難雜症

結論就是，我不應該這麼早就把視野鎖定在我最有興趣的領域

應該放寬眼界，把相關領域的知識也學遍，更可以融會貫通

這個論點說服了我，於是我選了電機工程學系，開始學習電子硬體相關的知識

同時，我還有選修一些資訊工程的課，齊頭並進

有了低階硬體的知識，確實對我學習軟體工程和電腦架構大有助益

例如，學習了如何用邏輯閘建構記憶體，讓我了解記憶體的存取原理

學習邏輯閘等級的加法器和乘法器設計，讓我理解兩者架構和運算資源需求的不同

後來，我的妹妹到美國唸高中

我想起了當時玩Jak X的美好時光，就託她買Jak三部曲並寄回台灣

玩過了這三個遊戲，我見識到了一流的遊戲設計和敘事鋪陳

我對Naughty Dog更加尊敬，並且對他們的遊戲正式上癮

我再次去了Naughty Dog的網站

看到了剛發佈的PS3神祕專案預告，而這正是初代Uncharted

我並沒有特別喜歡射擊遊戲

但我告訴自己 “這可是Naughty Dog啊，這個遊戲一定很棒!”

於是當我到美國拜訪我妹的時候，就順手買了一片Uncharted

我知道PS3沒有鎖區，所以等回到台灣之後，才買了PS3

Uncharted還蠻好玩的，而且技術上非常驚人，德瑞克的褲子沾水還會濕呢!

但是，Uncharted並沒有帶給我當初玩Jak三部曲時的感動

覺得這又是個畫面漂亮、打打殺殺的射擊遊戲而已

我對Naughty Dog的熱情稍微降了點溫

兩年後(大三)，我看到了Uncharted 2在E3展上的實機demo

我簡直不敢相信，德瑞克竟從正在崩塌的大樓中滑下並跳出，而且玩家還可以操控!

遊戲一發售，我馬上買了一片，並且不眠不休地將它玩完

真是個不可思議的遊戲:

壯麗的視覺表現、好玩的遊戲機制、刺激的故事演出、有趣又惹人愛的角色

我對Naughty Dog重拾了興趣

之後，我從大學畢業了

** = 出國 & 加入狗窩 = **

在服一年兵役前，我申請了一間叫作DigiPen的遊戲學校

我先前已寫過我在DigiPen的故事了，故容我在此快轉一下

(詳細故事請見 http://wp.me/p4mzke-Rb)

在DigiPen的第二個學期，Uncharted 3上市了

雖然故事上有些瑕疵，我還是玩得非常盡興

我對Naughty Dog的好感又增加了一些

不久之後，我看到了The Last of Us的首發預告

嚴肅的末世生存題材跟Uncharted的詼諧氣氛大相逕庭，還真是出乎我意料之外

我很讚賞Naughty Dog願意嘗試不同的遊戲風格

在2012年的Pax Prime遊戲展，我到了Naughty Dog的攤位

現場展示的是The Last of Us的實機遊玩

展示間的內部佈置，仿照喬爾和艾莉與獵人們戰鬥的旅館裝潢

這場展示令我印象深刻

喬爾與艾莉面對擁有人數優勢的敵人，在彈藥稀少的狀況下為生存而戰

讓玩家面對如此嚴苛的情境與壓力，是Naughty Dog遊戲中的首例

展示結束後，我拿到了一件艾莉T-shirt

也拿到了Neil Druckmann和Bruce Straley的簽名海報

The Last of Us發售當日，我立刻買了一片

能夠與遊戲角色產生情感上的連結，實在是一種稀有又美妙的經驗

艾莉給人的感覺，真的是活生生的夥伴，而不是某個用AI操控的跟班

喬爾與艾莉之間豐富的互動，給遊戲的故事注入了生命

他們一起戰鬥、互相關切、分享笑話、以及共同經歷情感轉折

如此扣人心弦的遊戲，實是非常罕見

再一次，我對Naughty Dog致上最高的欽佩與敬意

在DigPen學習的時候，我一直有個微薄的希望:

希望有朝一日可以加入Naughty Dog開發遊戲

我成功取得Uber Entertainment的實習機會，也參加了許多就業研討會

這時，我有著可以畢業即就業的自信

畢業前，我滿腦子都在規劃未來的出路:

先在學校附近(西雅圖地區)找個工作

希望可以是Wargaming, ArenaNet, 或Sucker Punch

工作了幾年之後，再試著去申請Naughty Dog

運氣好的話，搞不好真的可以加入Naughty Dog呢

誰知道命運的安排，將我的完美計畫稍微加速了一些

最終，我從DigiPen畢業，直接錄取Naughty Dog

(詳細故事請見 http://wp.me/p4mzke-TQ)

現在，我在Naughty Dog開發Uncharted 4

身為一個Naughty Dog粉絲

能夠參與開發奈森‧德瑞克的冒險終章，我感到非常榮幸

這真可謂為一趟祕境之旅呀!

Uncharted 4發售之後

我就會開始撰文介紹我自從加入Naughty Dog後，都做了些什麼

敬請期待