You can follow me on Twitter.

This post is part of my Gamedev Tutorials Series.

本文之中文翻譯在此

- Trignometry Basics – Sine & Cosine
- Trigonometry Basics – Tangent, Triangles, And Cannonballs
- Inverse Trigonometric Functions, Slope Angles, And Facing Objects

The dot product is a simple yet extremely useful mathematical tool. It encodes the relationship between two vectors’ magnitudes and directions into a single value. It is useful for computing projection, reflection, lighting, and so much more.

In this tutorial, you’ll learn:

- The geometric meaning of the dot product.
- How to project one vector onto another.
- How to measure an object’s dimension along an arbitrary ruler axis.

- How to reflect a vector relative to a plane.
- How to bounce a ball off a slope.

Let’s say we have two vectors, and . Since a vector consists of just a direction and a magnitude (length), it doesn’t matter where we place it in a figure. Let’s position and so that they start at the same point:

The **dot product** is a mathematical operation that takes two vectors as input and returns a scalar value as output. It is the product of the **signed magnitude** of the first vector’s projection onto the second vector and the magnitude of the second vector. Think of projection as casting shadows using parallel light in the direction perpendicular to the vector being projected onto:

We write the dot product of and as (read *a dot b*).

If the angle between the two vectors is less than 90 degrees , the signed magnitude of the first vector is positive (thus simply the magnitude of the first vector). If the angle is larger than 90 degrees, the signed magnitude of the first vector is its negated magnitude.

Which one of the vectors is “the first vector” doesn’t matter. Reversing the vector order gives the same result:

If is a unit vector, the signed magnitude of the projection of onto is simply .

Notice that there’s a right triangle in the figure. Let the angle between and be :

Recall from this tutorial that the length of the adjacent side of a right triangle is the length of its hypotenuse multiplied by the cosine of the angle , so the signed magnitude of the projection of onto is :

So the dot product of two vectors can be expressed as the product of each vector’s magnitude and the cosine of the angle between the two, which also reaffirms the property that the order of the vectors doesn’t matter:

If both and are unit vectors, then simply equals to .

If the two vectors are perpendicular (angle in between is ), the dot product is zero. If the angle between the two vectors is smaller than , the dot product is positive. If the angle is larger than , the dot product is negative. Thus, we can use the sign of the dot product of two vectors to get a very rough sense of how aligned their directions are.

Since monotonically decreases all the way to , the more similar the directions of the two vectors are, the larger their dot product; the more opposite the directions of the two vectors are, the smaller their dot product. In the extreme cases where the two vectors point in the exact same direction () and the exact opposite directions (), their dot products are and , respectively.

When we have two 3D vectors as triplets of floats, it isn’t immediately clear what the angle in between them are. Luckily, there’s an alternate way to compute the dot product of two vectors that doesn’t involve taking the cosine of the angle in between. Let’s denote the components of and as follows:

Then the dot product of the two vectors is also equal to the sum of component-wise products, and can be written as:

Simple, and no cosine needed!

Unity provides a function `Vector3.Dot`

for computing the dot product of two vectors:

float dotProduct = Vector3.Dot(a, b);

Here is an implementation of the function:

Vector3 Dot(Vector3 a, Vector b) { return a.x * b.x + a.y * b.y + a.z * b.z; }

The formula for computing a vector’s magnitude is and can also be expressed using the dot product of the vector with itself:

Recall the formula . This means if we know the dot product and the magnitudes of two vectors, we can reverse-calculate the angle between them by using the arccosine function:

If and are unit vectors, we can further simplify the formulas above by skipping the computation of vector magnitudes:

Now that we know the geometric meaning of the dot product as the product of a projected vector’s signed magnitude and another vector’s magnitude, let’s see how we can project one vector onto another. Let denote the projection of onto :

The unit vector in the direction of is , so if we scale it by the signed magnitude of the projection of onto , then we will get . In other words, is parallel to the direction of and has a magnitude equal to that of the projection of onto .

Since the dot product is the product of the magnitude of and the signed magnitude of the projection of onto , the signed magnitude of is just the dot product of and divided by the magnitude of :

Multiplying this signed magnitude with the unit vector gives us the formula for vector projection:

Recall that , so we can also write the projection formula as:

And if , the vector to project onto, is a unit vector, the projection formula can be further simplified:

Unity provides a function `Vector3.Project`

that computes the projection of one vector onto another:

Vector3 projection = Vector3.Project(vec, onto);

Here is an implementation of the function:

Vector3 Project(Vector3 vec, Vector3 onto) { float numerator = Vector3.Dot(vec, onto); float denominator = Vector3.Dot(onto, onto); return (numerator / denominator) * onto; }

Sometimes we need to guard against a potential degenerate case, where the vector being projected onto is a zero vector or a vector with an overly small magnitude, producing a numerical explosion as the projection involves division by zero or near-zero. This can happen with Unity’s `Vector3.Project`

function.

One way to handle this is to compute the magnitude of the vector being projected onto. Then, if the magnitude is too small, use a fallback vector (e.g. the unit +X vector, the forward vector of a character, etc.):

Vector3 SafeProject(Vector3 vec, Vector3 onto, Vector3 fallback) { float sqrMag = v.sqrMagnitude; if (sqrMag > Epsilon) // test against a small number return Vector3.Project(vec, onto); else return Vector3.Project(vec, fallback); }

Here’s an exercise for vector projection: make a ruler that measures an object’s dimension along an arbitrary axis.

A ruler is represented by a base position (a point) and an axis (a unit vector):

struct Ruler { Vector3 Base; Vector3 Axis; }

Here’s how you project a point onto the ruler. First, find the relative vector from the ruler’s base position to the point. Next, project this relative vector onto the ruler’s axis. Finally, the point’s projection is the ruler’s base position offset by the projected relative vector:

Vector3 Project(Vector3 vec, Ruler ruler) { // compute relative vector Vector3 relative = vec - ruler.Base; // projection float relativeDot = Vector3.Dot(vec, ruler.Axis); Vector3 projectedRelative = relativeDot * ruler.Axis; // offset from base Vector3 result = ruler.Base+ projectedRelative; return result; }

The intermediate `relativeDot`

value above basically measures how far away the point’s projection is from the ruler’s base position, in the direction of the ruler’s axis if positive, or in the opposite direction of the ruler’s axis if negative.

If we compute such measurement for each vertex of an object’s mesh and find the minimum and maximum measurements, then we can obtain the object’s dimension measured along the ruler’s axis by subtracting the minimum from the maximum. Offsetting from the ruler’s base position by the ruler’s axis vector multiplied by these two extreme values gives us the two ends of the projection of the object onto the ruler.

void Measure ( Mesh mesh, Ruler ruler, out float dimension, out Vector3 minPoint, out Vector3 maxPoint ) { float min = float.MaxValue; float max = float.MinValue; foreach (Vector3 vert in mesh.vertices) { Vector3 relative = vert- ruler.Base; float relativeDot = Vector3.Dot(relative , ruler.Axis); min = Mathf.Min(min, relativeDot); max = Mathf.Max(max, relativeDot); } dimension = max - min; minPoint = ruler.Base+ min * ruler.Axis; maxPoint = ruler.Base+ max * ruler.Axis; }

Now we are going to take a look at how to reflect a vector, denoted , relative to a plane with its normal vector denoted :

We can decompose the vector to be reflected into a parallel component (denoted ) and a perpendicular component (denoted ) with respect to the plane:

The perpendicular component is the projection of the vector onto the plane’s normal, and the parallel component can be obtained by subtracting the perpendicular component from the vector:

Flipping the direction of the perpendicular component and adding it to the parallel component gives us the reflected vector off the plane.

Let’s denote the reflection ):

If we substitute with , we get an alternative formula:

Unity provides a function `Vector3.Reflect`

for computing vector reflection:

float reflection = Vector3.Reflect(vec, normal);

Here is an implementation of the function using the first reflection formula:

Vector3 Reflect(Vector vec, Vector normal) { Vector3 perpendicular= Vector3.Project(vec, normal); Vector3 parallel = vec - perpendicular; return parallel - perpendicular; }

And here is an implementation using the alternative formula:

Vector3 Reflect(Vector vec, Vector normal) { return vec - 2.0f * Vector3.Project(vec, normal); }

Now that we know how to reflect a vector relative to a plane, we are well-equipped to simulate a ball bouncing off a slope.

We are going to use the Euler Method mentioned in a previous tutorial to simulate the trajectory of a ball under the influence of gravity.

ballVelocity+= gravity * deltaTime; ballCenter += ballVelocity* deltaTime;

In order to detect when the ball hits the slope, we need to know how to detect when a ball penetrates a plane.

A sphere can be defined by a center and a radius. A plane can be defined by a normal vector and a point on the plane. Let’s denote the sphere’s center , the sphere radius , the plane normal (a unit vector), and a point on the plane . Also, let the vector from to be denoted .

If the sphere does not penetrate the plane, the component of perpendicular to the plane, denoted , should be in the same direction as and have a magnitude no less than .

In other words, the sphere does not penetrate the plane if ; otherwise, the sphere is penetrating the plane by the amount and its position needs to be corrected.

In order to correct a penetrating sphere’s position, we can simply move the sphere in the direction of the plane’s normal by the penetration amount. This is an approximated solution and not physically correct, but it’s good enough for this exercise.

// returns original sphere center if not penetrating // or corrected sphere center if penetrating void SphereVsPlane ( Vector3 c, // sphere center float r, // sphere radius Vector3 n, // plane normal (unit vector) Vector3 p, // point on plane out Vector3 cNew, // sphere center output ) { // original sphere position as default result cNew = c; Vector3 u = c - p; float d = Vector3.Dot(u, n); float penetration = r - d; // penetrating? if (penetration > 0.0f) { cNew = c + penetration * n; } }

And then we insert the positional correction logic after the integration.

ballVelocity += gravity * deltaTime; ballCenter += ballVelocity* deltaTime; Vector3 newSpherePosition; SphereVsPlane ( ballCenter, ballRadius, planeNormal, pointOnPlane, out newBallPosition ); ballPosition = newBallPosition;

We also need to reflect the sphere’s velocity relative to the slope upon positional correction due to penetration, so it bounces off correctly.

The animation above shows a perfect reflection and doesn’t seem natural. We’d normally expect some sort of degradation in the bounced ball’s velocity, so it bounces less with each bounce.

This is typically modeled as a **restitution** value between the two colliding objects. With 100% restitution, the ball would bounce off the slope with perfect velocity reflection. With 50% restitution, the magnitude of the ball’s velocity component perpendicular to the slope would be cut in half. The restitution value is the ratio of magnitudes of the ball’s perpendicular velocity components after versus before the bounce. Here is a revised vector reflection function with restitution taken into account:

Vector3 Reflect ( Vector3 vec, Vector3 normal, float restitution ) { Vector3 perpendicular= Vector3.Project(vec, normal); Vector3 parallel = vec - perpendicular; return parallel - restitution * perpendicular; }

Here is the modified `SphereVsPlane`

function that takes variable restitution into account:

// returns original sphere center if not penetrating // or corrected sphere center if penetrating void SphereVsPlane ( Vector3 c, // sphere center float r, // sphere radius Vector3 v, // sphere velocity Vector3 n, // plane normal (unit vector) Vector3 p, // point on plane float e, // restitution out Vector3 cNew, // sphere center output out Vector3 vNew // sphere velocity output ) { // original sphere position & velocity as default result cNew = c; vNew = v; Vector3 u = c - p; float d = Vector3.Dot(u, n); float penetration = r - d; // penetrating? if (penetration > 0.0f) { cNew = c + penetration * n; vNew = Reflect(v, n, e); } }

And the positional correction logic is replaced with a complete bounce logic:

ballVelocity+= gravity * deltaTime; spherePosition += ballVelocity* deltaTime; Vector3 newSpherePosition; Vector3 newSphereVelocity; SphereVsPlane ( spherePosition , ballRadius, ballVelocity, planeNormal, pointOnPlane, restitution, out newBallPosition, out newBallVelocity; ); ballPosition= newBallPosition; ballVelocity= newBallVelocity;

Finally, now we can have balls with different restitution values against a slope:

In this tutorial, we have been introduced to the geometric meaning of the dot product and its formulas (cosine-based and component-based).

We have also seen how to use the dot product to project vectors, and how to use vector projection to measure objects along an arbitrary ruler axis.

Finally, we have learned how to use the dot product to reflect vectors, and how to use vector reflection to simulate balls bouncing off a slope.

If you’ve enjoyed this tutorial and would like to see more, please consider supporting me on Patreon. By doing so, you can also get updates on future tutorials. Thanks!

]]>You can follow me on Twitter.

This post is part of my Gamedev Tutorials Series.

本文之中文翻譯在此

At this point, we have learned about the three basic trigonometric functions: sine, cosine, and tangent. Now, we are going to take a look at their **inverse functions**, as well as how they can be utilized in games.

In this tutorial, you’ll learn:

- The inverse functions of the three basic trigonometric functions.
- How to compute the angle of a slope given a desired slope value.

- The domains and ranges of inverse trigonometric functions.
- The special convenience inverse trigonometric function
.**atan2** - How to make an object face towards the mouse cursor.

A function can be treated like a black box that takes some input and gives you some output. If a function takes an input and spits out an output , we can write it as (read *y equals f of x*). Meanwhile, if a function can take an output of and give back what input takes that could produce such output, we say that such function is the **inverse **of , and we write it as (read *f inverse*).

In other words, if the function takes and gives you , which can be written as , then can take as input and give you , which can be written as .

An example of a function verses its inverse is a function that **adds one** to its input and a function that **subtracts one** from its input. Let denote the function that adds one to , and denote the one that subracts one from . If we feed into , we get:

Now, if we feed back into , we get our original input back:

We already know that trigonometric functions take an angle as input and produce a number as output. We can feed the output of a trigonometric function (a real number) into its inverse function, and the inverse function would spit out the original input to the trigonometric function (an angle **in radians**). For example, , and .

Inverse trigonometric functions have special names. Rather than “sine inverse”, the inverse of sine, written as , is called **arcsine**. Similarly, and are called **arccosine** and **arctangent**, respectively. In Unity, here’s how you’d call these three inverse trigonometric functions:

float sinAngle = Mathf.Asin(sinValue); // arcsine float cosAngle = Mathf.Acos(cosValue); // arccosine float tanAngle = Mathf.Atan(tanValue); // arctangent

As a quick example, if we know the ratio of vertical rise versus horizontal offset of a hill in a game level, how do we compute the angle of the slope? Using the illustration below, how do we compute from the vertical rise and horizontal offset ?

The goal is to express using and . First, we can relate to and using the tangent function:

Next, we can obtain by feeding into :

Alternatively, we can view the equation above as the result of taking the arctangent of both sides of the previous equation. Generally, cancels out and gives you ; similarly, cancels out and gives you .

The angle is in radians. As mentioned in an earlier tutorial, we can convert the angle’s unit to degrees by multiplying it with .

So, we can make a little interactive program that allows the user to move a point that forms a slope with the origin, and use the point’s coordinates to compute and display the slope angle.

And here’s the code:

Vector3 point = p.transform.position; // compute slope angle in radians float angleRad = Mathf.Atan(point.y / point.x); // convert to degrees // Mathf.Rad2Deg is a constant equal to 180.0f / Pi float angleDeg = angleRad* Mathf.Rad2Deg; text = angleDeg + &amp;quot;°&amp;quot;;

When using inverse trigonometric functions, it’s important to understand their **domains** and **ranges**.

The domain of a function is the collection of all valid values as input, and the range of a function is the collection of all possible output values.

For example, the domain of is the collection of all real numbers, because you can pass any angle to it as input. And the range of is , which is a notation for the collection including all values between and including -1 and 1. If a parenthesis is used instead of a bracket, it means that side of the boundary is not included in the collection; for example, denotes a collection including all values between 0 and 10, but only including the boundary 0 and not the boundary 10.

The inverse of a function should simply have a domain and range equal to the range and domain, respectively, of the corresponding function, right? For inverse trigonometric functions, that’s not the case.

Trigonometric functions are periodic, which means multiple different input values can result in the same output value. For and , they even can have different input values within a single period resulting in the same output value.

Let’s use as an example again. Both and give the same value 1. So what is the output of ? It can’t be simultaneously equal to , , or other inputs that make equal to 1. In fact, the ranges of inverse trigonometric functions are chosen to be of limited range, commonly agreed upon and universally used.

Since the ranges of and are both , the domains of and are both as well. The ranges of and are chosen to be and , respectively. These ranges cover an angle range of radians, or 180 degrees.

Hence, is equal to , which is the one and only input that makes equal to 1 and lies within the range .

As for , since the range of is the collection of all real numbers, the domain of is the collection of all real numbers as well. And the range of is chosen to be , same as that of .

Lets say we have a point in 2D, and it is in the first quadrant, i.e. and . Let be the angle from the axis to the line segment connecting the origin and .

We know that , so we can compute from the coordinates of using the arctangent function: . Since both and are positive, would lie within , encompassed within the full range of arctangent, which is .

This this is what the computation looks like in code:

float angle = Mathf.Atan(p.y / p.x);

What if is in the fourth quadrant, i.e. and ? would become negative and would output a negative angle within , which is also encompassed within the full range of arctangent, .

Problems arise when we have in the second or third quadrant. If is in the second quadrant, i.e. and , the fraction is negative. We can find a point in the second quadrant that results in a ratio equal to a ratio from a point in the fourth quadrant. One such point pair are those that satisfy .

The two points and in the figure below have identical coordinate ratios .

Also seen in the figure above is that the coordinate ratios of points and , when compared to the coordinate ratios of points in the first quadrant and in the third quadrant, only differ in signs (negative instead of positive). All the absolute **sharp angles** (angles less than 90 degrees) between the line segments connecting the origin & the points and the X axis are identical.

The ratio is equal to the ratio , which is in turn equal to , because the two negative signs cancel out. So, if we pass as input to the arctangent function, actually gives you the same negative angle as , because an angle in the fourth quadrant is within the range of arctangent, but an angle in the second quadrant is not.

When we pass in to the arctangent function, what we really want to get is the green positive **astute angle** (angle larger than 90 grees) shown in the figure below, not the red negative sharp ones. We always want to start measuring angles from the +X direction.

In order to do so, before combining and into a ratio and passing it to the arctangent function, we check the signs of and first to see which quadrant the point is in. And if we get an angle outside the range , we fix up the output of the arctangent function to get the output angle in the correct quadrant. Here’s the code that does this fix-up:

// range of this function is (-pi, pi] float FixedUpAtan(float py, float px) { if (px > 0.0f) // normal, no fix-up needed { // &amp;quot;normal&amp;quot; // py > 0.0f : first quadrant // py < 0.0f : fourth quadrant return Mathf.Atan(py / px); } else if (px < 0.0f) // fix-up needed { if (py > 0.0f) // second quadrant return Math.PI + Mathf.Atan(py / px); else if (py < 0.0f) // third quadrant return -Math.PI + Mathf.Atan(py / px); else // angle on negative X axis return 2.0f * Mathf.PI; } else // infinity { if (py > 0.0f) return 0.5f * Mathf.PI; // ratio is positive infinity else if (py < 0.0f) return -0.5f * Mathf.PI; // ratio is negative infinity else return 0.0f; // degenerate input (the origin) } }

That seems like quite a lot of work. Luckily, almost all standard math libraries in any programming languages provide a convenience function called **atan2**, which has a full 360-degree range of and does exactly what the code above does (most likely in a more efficient and optimized fashion). Note that the argument order is Y first and X second. **Atan2** in different libraries may have different ordering of the two arguments, but based on what I’ve seen, Y followed by X is pretty common.

I often see a misconception that **atan2** is just an alternative to the arctangent function and doesn’t do anything extra that arctangent cannot do. This is actually incorrect. The arctangent function only takes a single value as input, and its output range is . On the other hand, **atan2** takes **two** values as input ( and before they are combined into a single ratio), and the output has a full 360-degree range of .

Lastly, let’s look at a classic example of facing an object towards the mouse cursor.

First, find the intersection between the ray under the mouse cursor and the ground plane. Then, place an object at that intersection, creating the effect of the object following the mouse cursor in 3D. This object is our look target.

Camera cam = Camera.current; Vector3 mouse= Input.mousePosition; Ray ray = cam.ScreenPointToRay(mouse); float rayDist; plane.Raycast(ray, out rayDist); sphere.position = ray.GetPoint(rayDist);

Next, let’s use our old friend UFO Bunny from Boing Kit again. When un-rotated, her forward vector is in the +X direction, and her left vector is in the +Z direction. We want to face her towards the look target.

Then, let UFO Bunny be the origin, and calculate the coordinates of the look target relative to her:

Vector3 coord = sphere.transform.position - ufoBunny.transform.position;

Now, let’s mark up the scene with an angle between the X axis and the line segment connecting UFO Bunny and the look tartget:

As shown before, the angle can be calculated from the convenient **atan2** function:

float thetaRad = Mathf.atan2(coord.z, coord.x); // in radians

Recall this figure:

This figure shows the XY plane, and as increases, rotates counterclockwise around the origin. The rotation axis of such rotation is the +Z axis (later tutorials will explain this in more details). The UFO Bunny and the look target lie on the XZ plane; to translate the figure on the XY plane to the XZ plane, we map the +X axis to the +X axis, the +Y axis to the +Z axis, and the rotation axis of the +Z axis to the -Y axis.

Now that we have the rotation axis and the desired rotation angle, we can finally construct a **quaternion** representing such rotation. Quaternions will also be covered in later tutorials. For now, we just need to know that quaternion is a type of data Unity uses to represent object rotation.

float thetaDeg = thetaRad * Mathf.Rad2Deg; // in degrees float axis = Vector3.down; // (0, -1, 0) == -Y axis Quaternion rot = Quaternion.AngleAxis(thetaDeg, axis); ufoBunny.transform.rotation = rot;

And here’s our final result:

Note: Unity already provides helper functions like `Quaternion.LookRotation`

and `Transform.LookAt`

that can achieve the same effect. But the purpose of this tutorial is to help understand inverse trigonometric functions.

In this tutorial, we have been introduced to the inverse trigonometric functions, how they relate to their corresponding trigonometric functions, and their domains and ranges.

Also, we have seen that the arctangent function doesn’t have a full 360-degree range, but a convenient utility function **atan2** does.

Lastly, we have learned how to use the **atan2** function to implement the classic example of facing an object towards the mouse cursor.

If you’ve enjoyed this tutorial and would like to see more, please consider supporting me on Patreon. By doing so, you can also get updates on future tutorials. Thanks!

]]>You can follow me on Twitter.

This post is part of my Gamedev Tutorials Series.

本文之中文翻譯在此

In the previous tutorial, we have learned about two basic trigonometric functions: sine & cosine. This time, we are going to look at another basic trigonometric function: tangent. Together, these three functions form the basis of trigonometry, and they can be used to solve all sorts of geometric problems that arise in game development.

In this tutorial, you’ll learn:

- A geometric interpretation of another basic trigonometric function: tangent.
- The relationships among sine, cosine, and tangent.
- How to use tangent to create smooth intro and outro motion.

- How to relate angles and sides of right triangles using trigonometric functions.
- How to simulate a cannonball, given an initial speed and an elevation angle.
- How to draw predicted trajectories even before firing the cannonball.

- How to place cannonball targets, given a horizontal distance and an elevation angle.

Let’s look at the unit circle from the last tutorial, with a point on it, as well as the angle between the X axis (+X direction) and the line segment formed by and the origin.

Recall that the coordinates of is . This time we are going to look at a new trigonometric function: (tangent of theta). It is the **slope** of the line segment between and the origin.

The slope of a line is its ratio of vertical change versus horizontal change. For example, let’s look at this line segment:

To move from point to point , we walk 3 units in the +X direction and then 2 units in the +Y direction, so the slop of the line is .

And for a line segment that goes “downhill” like this:

The slope would be , a negative value, since the vertical change versus horizontal change is negative.

Now, back to the unit circle figure:

We see that moving from the origin to involves a horizontal change of and a vertical change of , so the slope of the line segment between the origin and is , hence .

But that’s just a mathematical expression. Here’s where visually fits into the unit circle figure. Let’s draw a tangential line to the circle at , i.e. a line that goes through and is perpendicular to the line segment between and the origin:

Let’s just look at the portion of this tangential line that is between and the X axis, and mark up some of the points:

The angles and are right angles. And . Let denote the length of the line segment between point and point .

Now, split the figure into two triangles:

Since all internal angles of a triangle add up to and both triangles have an angle of and , the unmarked angles from both triangles, and , are exactly the same: .

If two triangles have identical sets of angles, then they are **similar**, i.e. if you proportionally scale, rotate, and/or flip one of them, it can become identical to the other one.

When two triangles are similar, the ratio between the lengths of two sides from one triangle equals to the ratio between the lengths of the corresponding sides of the other triangle. Thus:

We know that the coordinates of are , so and . And we know that is equal to the radius of the unit circle, so . Now the equation above becomes:

And we know that , so we get:

We have found the visual representation of !

The absolute value of is the length of the portion of the tangential line between point and the X axis. Notice that I said *absolute value*, because depending on the signs of and , can be positive or negative. This animation highlights (in blue) the line segment whose length is equal to the absolute value of :

We’ve seen the plots for and versus in the previous tutorial. Let’s overlay them on top of each other:

And now let’s add into the mix:

Notice how, unlike and , the value of is not constrained within the range. Since , the absolute value of approaches infinity as approaches zero. Also, unlike and , the period of is , instead of .

Another thing worth noting is the relationships among the signs of the three basic trigonometric functions. Since , the sign of is positive when and have the same sign, and is negative otherwise.

Now, let’s try plugging the tangent curve over time into the X coordinate of an object:

float tan = Mathf.Tan(Rate * Time.time); obj.transform.position = Vector3(tan, 0.0f, 0.0f);

The object comes in fast from the direction, slows down a bit, and then runs off fast again towards the direction.

We can utilize this motion to create effects like these falling stars:

float tan = Mathf.Tan(Rate * Time.time); obj.transform.position = center + moveDirection * tan;

The acceleration and deceleration are kind of subtle. We can further amplify the effect by raising the tangent function to a power of, say, 3:

float tan = Mathf.Tan(Rate * Time.time); float tan3 = tan * tan * tan; obj.transform.position = center + moveDirection * tan3;

So we’ve seen how the three basic trigonometric functions relate to the unit circle. Now we’re going to take a look at their relationships with triangles. They are called trigonometric functions, after all. Specifically, we’re going to look at **right triangles** (triangles with a right angle).

First, let’s get the terminologies out of the way. Here is a right triangle with an angle marked up as :

The side of the triangle between and the right angle is called the **adjacent side**, since it is adjacent to . The other side next to the right angle is called the **opposite side**, because it is across from . The remaining (also the longest) side opposed to the right angle is called the **hypotenuse**:

And here is how the three basic trigonometric functions relate to **the lengths** of the triangle sides:

- length of the
**opposite side**divided by length of the**hypotenuse**. - length of the
**adjacent side**divided by length of the**hypotenuse**. - length of the
**opposite side**divided by length of the**adjacent side**.

Or, in mathematical form:

These equations could be a bit too much to remember. Here’s a common verbal mnemonic that might help: **soh-cah-toa** (**s**ine is the **o**pposite side divided by the **h**ypotenuse, **c**osine is the **a**djacent side divided by the **h**ypotenuse, and **t**angent is the **o**pposite side divided by the **a**djacent side).

I did not learn this verbal mnemonic in Taiwan (my math classes were taught in Mandarin). What I learned was a visual mnemonic that I’m quite fond of: Write the **initials** of sine, cosine, and tangent in cursive, along with the right triangle as shown below (please forgive my ugly handwriting).

When you write an initial, the corresponding function equals the length of the **first side you write past** dividing the length of the **second side you write past**:

- length of the
**hypotenuse**dividing length of the**opposite side**. - length of the
**hypotenuse**dividing length of the**adjacent side**. - length of the
**adjacent side**dividing length of the**opposite side**.

When describing fractions in Mandarin, instead of saying “A divided by B”, we say “B dividing A”. That’s why this mnemonic orders the divisor before the dividend in its wording. This ordering might not be intuitive to native English speakers, but if you find it useful, then great!

Now back to the equations:

Whatever the size of the right triangle, the equations above always hold true, because ratios between two sides are independent of the absolute lengths of individual sides.

If we scale the triangle so that the hypotenuse is of length 1, then we can fit it back into our unit circle figure, with being the coordinates of point on the circle:

And the equations above agree nicely with the coordinates of , :

Knowing the equations for trigonometric functions in terms of lengths of right triangle sides, for any given right triangle with an angle , if we know the length of any one side, we can derive the lengths of the other two sides using the three basic trigonometric functions.

Let denote the length of the ad**j**acent side, the length of the oppo**s**ite side, and the length of the **h**ypotenuse:

If we know the length of the hypotenuse (), then , and :

If we know the length of the adjacent side (), then , and :

If we know the length of the opposite side (), then , and :

Finally, it’s time for practical examples! Let’s see how we can simulate cannonballs when given an initial speed, a horizontal angle, and an elevation angle. Also, let’s find out how we can display the predicted trajectories even before firing the cannon.

But before all that, here’s a very quick recap on some basic terminologies in motion dynamics. An object’s **position** is where the object is physically located. An object’s **velocity** is the rate of change in its position (typically expressed as change of position per second). An object’s **acceleration** is the rate of change in its velocity (typically expressed as change of velocity per second).

The Euler Method is a quick and easy algorithm for simulating object movement: For each moving object, we store its velocity vector along with its position. For each update, or **time step**, we change the velocity by acceleration times **delta time** (the time difference between each update), and then we change the position by velocity times delta time:

velocity += acceleration * deltaTime; position += velocity * deltaTime;

To simulate gravity at ground level and at human scale, we let the acceleration be a constant downward-pointing vector. Here’s an example of how an object would move in 2D under the influence of gravity when starting off with an initial velocity pointing up and to the right, simulated using the Euler Method:

If we simulate the entire trajectory within a single frame by performing multiple time steps, and draw a little dot once every several iterations, we can get ourselves a nice indicator of the predicted trajectory:

velocity = initialVelocity; position = initialPosition; for (int i = 0; i < NumIterations; ++i) { velocity += acceleration * deltaTime; position += velocity * deltaTime; if (i % IterationsPerDot != 0) continue; DrawDot(position); }

Now, let’s compute the **initial velocity** of a cannonball if it is fired from the cannon at an initial speed (length of the initial velocity vector), a horizontal angle , and an elevation angle (phi). Let the direction be the cannon’s forward direction and the direction be its right direction (Unity uses left-hand coordinates).

To compute the initial velocity, we need to first compute a **unit vector** (vector of length 1) in the same direction. Once we have that unit vector, we can simply multiply all its components by a the desired speed to obtain the initial velocity vector.

The diagram below shows the a unit vector in the direction in red, a unit vector in the direction in green, a unit vector in the direction in blue, a unit vector in the direction of initial velocity in black (labeled ), a unit vector in the horizontal direction of the initial velocity in gray (labeled ), the horizontal angle (between and ), and the elevation angle (between and ):

The goal is to find and multiply it with . We can isolate the unit vectors and angles from the diagram above into two unit circle diagrams.

One is a horizontal unit circle diagram with , , , and :

And the other one is a vertical unit circle diagram with , , , and :

If we view the first (horizontal) unit circle diagram from a different angle, we’ll get a familiar view of a flat unit circle:

We’ve done this math before. The component of in the drection of is of length , and the component in the direction of is of length . This gives us .

Now, view the second (vertical) unit circle diagram from a different angle that gives us the same familiar view of a flat unit circle:

It’s the same drill. The component of in the direction of is of length , and the component in the direction of is of length , so we can now compute :

Multiplying with gives us our initial velocity vector:

And the corresponding code is:

Vector3 ComputeInitialVelocity() { float sinTheta = Mathf.Sin(HorizontalAngle); float cosTheta = Mathf.Cos(HorizontalAngle); float sinPhi = Mathf.Sin(ElevationAngle); float cosPhi = Mathf.Cos(ElevationAngle); return InitialSpeed * new Vector3 ( cosPhi * sinTheta, sinPhi, cosPhi * cosTheta ); }

Being able to compute the initial velocity vector from a given initial speed, horizontal angle, and elevation angle, we are now well-equipped to simulate a cannonball:

void FireCannon() { velocity = ComputeInitialVelocity(); obj.transform.position = InitialPosition; } void Update() { float dt = Time.deltaTime; velocity += acceleration * dt; obj.transform.position += velocity * dt; } void DrawTrajectory() { float dt = Time.fixedDeltaTime; Vector3 velocity = ComputeInitialVelocity(); Vector3 position = InitialPosition; for (int i = 0; i < NumIterations; ++i) { velocity += acceleration * dt; position += velocity * dt; if (i % IterationsPerDot != 0) continue; DrawDot(position); }

Now that we can fire cannonballs, let’s place some targets. If we want to place a target at a given horizontal distance away from the cannon, as well as at a given elevation angle, where exactly should we place the targets?

Below is the desired end result. Each target is at a fixed horizontal distance (on the XZ plane) away from the cannon, and is at a fixed elevation angle above ground. The targets are also equally spaced out horizontally, i.e. their horizontal angles relative to the cannon are equally spaced out.

We already know how to compute a horizontal unit vector from a horizontal angle . The horizontal unit vector is . Multiplying such horizontal vector with a given horizontal distance, denoted , gives us the horizontal offset vector of the target from the cannon: . Equally spacing out different and computing the horizontal offset vector for each value gives us the XZ coordinates of the targets (shown as red dots in the image below):

The last step is to determine the Y coordinates of the targets, i.e. how far off ground the targets should be. Recall that if we know the length of the adjacent side to an angle of a right triangle to be , then the length of the opposite side is .

Substituting with the given horizontal distance and with the elevation angle , the formula for the Y coordinate of the targets becomes .

We can finally place our targets at the desired positions:

float theta = -0.5f * AngleInterval * (NumTargets - 1); float elevationTan = Mathf.Tan(ElevationAngle); foreach (var target in targetArray) { Vector3 horizontalVec = HorizontalDistance * new Vector3 ( Mathf.Sin(theta), 0.0f, Mathf.Cos(theta) ); theta += AngleInterval; Vector3 verticalVec = HorizontalDistance * elevationTan * Vector3.up; target.transform.position = Cannon.position + horizontalVec + verticalVec; }

We haven’t talked about how to detect when a cannonball hits a target or the ground yet. Right now the cannonballs would just go through the targets:

Collision detection is beyond the scope of this tutorial, so I’ll just go over the very basics of sphere-sphere collision really quick.

To detect when a cannonball hits the target, check the distance between the centers of the two and see if it’s less than the sum of their radii. If the cannonball does collide with a target, we destroy the cannonball and the target.

Vector3 cannonballToTargetVec = target.transform.position - cannonball.transform.position; float cannonballToTargetDist = cannonballToTargetVec.magnitude; float radiusSum = cannonballRadius + targetRadius; if (cannonballToTargetDist < radiusSum) { DestroyCannonball(); DestroyTarget(); }

Using a similar technique when drawing the predicted trajectory, we can terminate the trajectory early when it hits a target.

However, this collision detection technique is **discrete**, meaning that the cannonball can still go through targets if it travels fast enough. We can mitigate this problem with a **continuous** collision detection technique, but that is also beyond the scope of this tutorial and will be touched on in later tutorials.

Previously, we have been introduced to two basic trigonometric functions: sine and cosine. In this tutorial, we have seen a geometric interpretation of another trigonometric function: tangent. We have also learned the relationship among sine, cosine, and tangent, in the context of the unit circle, as well as right triangles.

Next, we have plotted the tangent function alongside sine and cosine; and we are now able to create smooth into and outro motion by utilizing the tangent function.

Finally, using the three basic trigonometric functions, we have learned how to predict and simulate the trajectory of a cannonball, given an initial speed and elevation angle. Plus, we have seen how to place targets, given a horizontal distance and an elevation angle.

We have learned the basics of the three fundamental trigonometric functions that are essential in solving daily gamedev problems. In later tutorials, I will go over more useful mathematical tools that are built on top of these trigonometric functions, as well as some of their practical applications.

If you’ve enjoyed this tutorial and would like to see more, please consider supporting me on Patreon. By doing so, you can also get updates on future tutorials. Thanks!

]]>You can follow me on Twitter.

This post is part of my Gamedev Tutorials Series.

本文之中文翻譯在此

Trigonometry is a very essential building block to a huge portion of game math. That’s why I’ve chosen this topic for the first tutorial of my new Gamedev Tutorials series. Having a solid understanding of basic of trigonometry can go a long way for game development. It is used extensively in game problem solving.

In this tutorial you’ll learn:

- A geometric interpretation of two basic trigonometric functions: sine & cosine.
- The comparison of two different angle units: degrees & radians.
- Some basic properties of sine & cosine.
- How to move and arrange things in a circular fashion:

- How to move things in a spiral fashion:

- How to create simple harmonic motion:

- How to create damped spring motion:

- How to create pendulum motion:

- How to generate hovering motion:

Let’s look at the **unit circle**, a circle with a radius of 1 centered at the origin.

Now pick a point on the circle. The line segment between this point and the origin forms an angle (theta) between it and the X axis (positive X direction).

What’s shown here is actually a way to geometrically express the 2 basic trigonometric functions: (sine of theta) and (cosine of theta). The coordinates of this point are exactly .

So, to recap, the two trigonometry function and are, respectively, the Y and X coordinates of a point on the unit circle, where is the angle from the X axis (positive X direction) to the line segment between the point and the origin.

Since and are functions, the proper notation should include parenthesis around the input: and , but many people and literature just ignore the parenthesis and write them as and .

and are functions that take one single input (an angle) and output a single value between -1 and 1. If you think about it, this output range makes sense: the X and Y coordinates of a point on the of the unit circle can never go outside of the range.

If the angle increases at a constant rate, we can plot the value of the point’s X and Y coordinates individually over time.

If we compare the plots side-by-side in the form of angle-vs-value, we can see they are the same periodic curve in a wave-like shape, but offset by a fourth from each other.

The period of these functions is , so gives the same value as . This makes sense, because rotating an extra past would bring us back to the same angle.

The angle passed into the trigonometric functions can be in two different units: **degrees** and **radians**. Most people are familiar with degrees and its upper-little-circle notation. For instance, the right angle (90 degrees) is written as . Do beware that is not the same as . If the degree notation is not present, the angle’s unit is actually regarded as **radians**.

is equivalent to (pi) radians, where is the famous mathematical constant, “the ratio of a circle’s circumference to its diameter”, approximately equal to 3.14. Hence, an angle of 1 radian is approximately , almost . As a sanity check, entering on an engineering calculator (with angle units set to radians) will give us approximately , which is indeed close to .

Here are some common degree-to-radian mappings:

In Unity, the and functions are called via `Mathf.Sin`

and `Mathf.Cos`

, respectively. Beware that these functions take input in radians, so if you want to compute , don’t write:

// this is actually cosine of 45 radians! float cos45Deg = Mathf.Cos(45.0f);

45 radians is about . A full revolution of is equivalent to no rotation at all. divided by gives us a remainder of , which is the equivalent angle of and is different from .

To compute , write this instead:

// covert to radians float cos45Deg = Mathf.Cos(45.0f * Mathf.PI / 180.0f);

Or use constants that help convert between degrees and radians.

float cos45Deg = Mathf.Cos(45.0f * Mathf.Deg2Rad);

In tools like the Unity editor, expressing angles in degrees is more user friendly, because most people can immediately picture what a angle looks like. However, in the context of math and programming, many people, myself included, prefer sticking with radians.

One useful thing about radians is that it trivializes calculating arc length from a given radius and angle. Let’s say we want to calculate the length of an arc of , or radians, from a circle of radius 2.

If computed using degrees, first the whole circumference is calculated using the formula , and then it’s multiplied by the ratio of out of :

When using radians, the arc length formula is simply **radius times angle in radians**:

The circle’s circumference formula agrees nicely with the arc length formula in radians. Since one full circle is basically an arc with an angle of radians, the length of such arc is , exactly the same as the circle’s circumference formula.

Now let’s look at some basic properties of sine & cosine that can come in handy in future mathematical derivations.

Since are coordinates of a point on the unit circle, the point’s distance from the origin is always 1, regardless of the angle . The Pythagorean theorem states that the distance of the point from the origin is . From there we can get this identity (equation that is always true):

The squares of and are written as and , respectively. People write them that way probably because they are too lazy to write and .

Recall the side-by-side comparison of the sine and cosine plots.

You can see that the cosine curve basically is the sine curve shifted to the left by , or radians. This means we can get these identities that convert between the two:

Now that we’ve seen that are 2D coordinates of a point on the unit circle, we can start playing with some basic circular motion in Unity.

The code below moves an object around a circle at a constant rate:

obj.transform.position = new Vector3 ( Radius * Mathf.Cos(Rate * Time.time), Radius * Mathf.Sin(Rate * Time.time), 0.0f );

The code below moves 12 objects around a circle at a constant rate, and the objects are equally spaced out around the circle:

float baseAngle = Rate * Time.time + angleOffset; for (int i = 0; i < 12; ++i) { float angleOffset = 2.0f * Mathf.PI * i / 12.0f; aObj[i].transform.position = new Vector3 ( Radius * Mathf.Cos(baseAngle + angleOffset), Radius * Mathf.Sin(baseAngle + angleOffset), 0.0f ); }

Combining circular motion with movement in the Z direction, we can create a spiral motion in 3D:

obj.transform.position = new Vector3 ( Radius * Mathf.Cos(Rate * Time.time), Radius * Mathf.Sin(Rate * Time.time), ZSpeed * Time.time );

We’ve seen this plot of cosine versus angle:

What if we plug cosine into the X coordinate of an object?

float x = Mathf.Cos(Rate * Time.time); obj.transform.position = Vector3(x, 0.0f, 0.0f);

This is what we get:

This kind of oscillating motion that matches a sine-shaped curve, a.k.a. sinusoid, is known as simple harmonic motion, or S.H.M.

Since starts at zero, the object’s X coordinate starts at . If we use , the X coordinate would start at .

The input angle passed in to the sine and cosine functions are called the **phase**. Typically, if the phase passed in is a constant multiple of time, many people write it as , where (omega) is called the **angular frequency** (in **radians per second**), and is the time. For example, would produce a simple harmonic motion that oscillates one full cycle every second.

What if we scale this motion by an exponentially decreasing factor?

float s = Mathf.Pow(0.5f, Decay * Time.time); float x = Mathf.Cos(Rate * Time.time); obj.transform.position = Vector3(s * x, 0.0f, 0.0f);

Now the object moves in a **damped spring motion**:

Instead of plugging a sinusoid into an object’s X coordinate, what if we plug it into the angle for the circular motion example above?

float baseAngle = 1.5f * Mathf.PI; // 270 degrees float halfAngleRange = 0.25f * mathf.PI; // 45 degrees float c = Mathf.Cos(Rate * Time.time); float angle = halfAngleRange * c + baseAngle; obj.transform.position = new Vector3 ( Radius * Mathf.Cos(angle), Radius * Mathf.Sin(angle), 0.0f );

The object now moves in a **pendulum motion**:

We can treat this as the circular motion’s angle being in a simple harmonic motion.

As a bonus example, here is UFO Bunny, a character from Boing Kit, my bouncy VFX extension for Unity.

We can apply staggered simple harmonic motion to her X, Y, and Z coordinates separately.

Vector3 hover = new Vector3 ( RadiusX * Mathf.Sin(RateX * Time.time + OffsetX), RadiusY * Mathf.Sin(RateY * Time.time + OffsetY), RadiusZ * Mathf.Sin(RateZ * Time.time + OffsetZ) ); obj.transform.position = basePosition + hover;

And this creates a hovering motion.

And the hover offset can be used to compute a tilt rotation. This is beyond the scope of this tutorial, so I’ll just leave the code and results here.

obj.transform.rotation = baseRotation * Quaternion.FromToRotation ( Vector3.up, -hover + 3.0f * Vector3.up );

That’s it!

We have seen how and can be geometrically defined as coordinates of a point on the unit circle.

Also, we have seen the difference between the two angle units: degrees and radians.

Finally, we now know how to moves things in circles and spirals, as well as oscillating things in simple harmonic motion, damped spring motion, pendulum motion, and hove motion.

I hope this tutorial has helped you get a better understanding of the 2 basic trigonometric functions: sine & cosine.

In the next tutorial, I will introduce one additional basic trigonometric function: **tangent**, as well as talk about more applications of all these 3 functions.

Until then!

Over time, I have adopted several coding patterns for writing readable and debuggable multi-condition game code, which I systematically follow.

By multi-condition code, I’m talking about code where certain final logic is executed, after multiple condition tests have successfully passed; also, if a test fails, further tests will be skipped for efficiency, effectively achieving short-circuit evaluation.

**NOTE:** I only mean to use this post to share some coding patterns I find useful, and I do NOT intend to suggest nor claim that they are “the only correct ways” to write code. Also, please note that it’s not all black and white. I don’t advocate unconditionally adhering to these patterns; I sometimes mix them with opposite patterns if it makes more sense and makes code more readable.

**Shortcuts:**

Early Outs

Debuggable Conditions

Debug Draw Locality

Forcing All Debug Draws

There are two ends of spectrum regarding this subject, **early outs** versus **single point of return**. Both camps have pretty valid arguments, but I lean more towards the early-out camp. The early-out style is the foundation of all the patterns presented in this post.

As an example, consider the scenario where we need to test if a character is facing the right direction, if the weapon is ready, and if the path is clear, before finally executing an attack.

This is one way to write it:

FacingData facingData = PrepareFacingData(); if (TestFacing(facingData)) { WeaponData weaponData = PrepareWeaponData(); if (TestWeaponReady(weaponData)) { PathData pathData = PareparePathData(); if (TestPathClear(pathData)) { Attack(); } } }

The code block above is written in the so-called **single-point-of-return** style. The logic flow is straightforward and always ends at the bottom of the code block. If the code block is wrapped inside a function, the function will have one single point of return, i.e. at the bottom of the function:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (TestFacing(facingData)) { WeaponData weaponData = PrepareWeaponData(); if (TestWeaponReady(weaponData)) { PathData pathData = PareparePathData(); if (TestPathClear(pathData)) { Attack(); } } } // return here }

The same logic in **early-out** style would look something like this, returning right where the first test fails:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData)) return; WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData)) return; PathData pathData = PareparePathData(); if (!TestPathClear(pathData)) return; Attack(); }

I like this style better, because the intent to get out of the function right after the first failing test is very clear. Also, as the number of tests get large, it avoids excessive indentation, which many jokingly call “indent hadouken.”

PHP Streetfighter #php pic.twitter.com/5dQ5H1UrB2

Paul Dragoonis (@dr4goonis) June 11, 2014

Early outs don’t just apply to functions. They apply to loops as well. For example, this is how I would iterate through characters and collect those who can attack:

for (Characer &c : characters) { FacingData facingData = ParepareFacingData(c); if (!TestFacing(facingData)) continue; WeaponData weaponData = PrepareWeaponData(c); if (!TestWeaponReady(weaponData)) continue; PathData pathData = PareparePathData(c); if (!TestPathClear(pathData)) continue; charactersWhoCanAttack.Add(c); }

One special case where early out is not quite possible without refactoring, is that if there’s more code that must always be executed after the condition tests.

In the single-point-of-return style, it may look like this:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (TestFacing(facingData)) { WeaponData weaponData = PrepareWeaponData(); if (TestWeaponReady(weaponData)) { PathData pathData = PareparePathData(); if (TestPathClear(pathData)) { Attack(); } } } // always execute PostAttackTry(); }

But using the early-out style, we’d have a problem:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData)) return; WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData)) return; PathData pathData = PareparePathData(); if (!TestPathClear(pathData)) return; Attack(); // Uh, oh. Not always executed! PostAttackTry(); }

If you’re comfortable with using forward-only `goto`

s to jump to the `PostAttackTry`

call and your team’s coding standards allow it, then you’re set. Otherwise, we need to keep looking for solutions.

What about calling `PostAttackTry`

wherever the function returns?

void TryAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData)) { PostAttackTry(); return; } WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData)) { PostAttackTry(); return; } PathData pathData = PareparePathData(); if (!TestPathClear(pathData)) { PostAttackTry(); return; } Attack(); PostAttackTry(); }

This is also not good. The moment someone adds a new return while forgetting to also add a call to `PostAttackTry`

, the logic breaks.

In this case, if the logic is trivial, I’d be okay with just using the single-point-of-return style. Otherwise, I would refactor the tests into a separate function, while maintaining the early-out style:

bool CanAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData)) return false; WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData)) return false; PathData pathData = PareparePathData(); if (!TestPathClear(pathData)) return false; return true; } void TryAttack() { if (CanAttack()) Attack(); PostAttackTry(); }

[EDIT]

I’ve received feedback proposing the use of destructors of some helper class to invoke the final logic, a la scope-based resource management (RAII).

For me, the acceptable instances that rely on destructors like this are scoped-based resource management, profiler instrumentation, and whatever logic that comes in the form of a tightly coupled “entry” and “exit” logic pairs, but not this.

In my acceptable cases, an exit logic is ensured to always accompany its corresponding entry logic, taking burden off programmers by preventing them from accidentally exiting a scope without executing the exit logic.

I think relying on destructors to execute arbitrary logic upon scope exit that is not coupled with an entry logic induces unnecessary complexity and risk of omittance. When reading the code, instead of making a mental note of “something is executed here, and the accompanying exiting logic will be executed upon scope exit”, the readers now have to remember that “nothing is done yet, and only when the scope is exited will something happen.” To me, the latter is a mental burden that is more likely to be omitted, because it’s not tied to any concrete logic execution right at the code location where the helper struct is constructed.

[/EDIT]

Continuing using the character attack example from above, let’s consider the scenario where each test function now returns a results struct that contains a success flag indicating whether the test has passed, as well as extra info gathered from the test that is useful for debugging purposes.

This is what the code might look like:

void TryAttack() { FacingData facingData = PrepareFacingData(); if (!TestFacing(facingData).IsFacingValid()) return; WeaponData weaponData = PrepareWeaponData(); if (!TestWeaponReady(weaponData).IsWeaponReady()) return; PathData pathData = PareparePathData(); if (!TestPathClear(pathData).IsPathClear()) return; Attack(); }

It looks good and all, but there’s one specific issue that technically doesn’t lie in the code itself, but it affects the programmer’s experience when debugging this piece of code.

Embedding the test function’s return value within `if`

conditions like this, means that if we want to set a break point and peek inside the extra info in the results structs, we’d have to step into the individual tests, step out of the tests, and then look at the returned value.

Visual Studio supports this feature inside the autos window:

I find it slightly annoying to have to step in and then step out of test functions, just to inspect their results results structs. So, I normally assign the return values to local variables, and then perform `if`

checks on those variables instead.

This way, I can just step over the test function calls and inspect the results structs, without having to step in and out of the test functions:

void TryAttack() { FacingData facingData = PrepareFacingData(); FacingResults facingResults = TestFacing(facingData); if (!facingResults.IsFacingValid()) return; WeaponData weaponData = PrepareWeaponData(); WeaponResults weaponResults = TestWeaponReady(weaponData); if (!weaponResults.IsWeaponReady()) return; PathData pathData = PareparePathData(); PathResults pathResults = TestPathClear(pathData); if (!pathResults.IsPathClear()) return; Attack(); }

Even if we’ve stepped past some tests, extra info regarding the tests is still available in the locals window, which is a convenience I now cannot live without:

As a side note, if the test is just a trivial expression, I think embedding it in the `if`

condition is totally fine, and it can actually make the code cleaner.

**NOTE:** For simplicity’s sake, code mechanisms to effectively strip debug draws in release build is omitted in this post (e.g. `#ifdef`

s, macros, flags, etc.).

Sometimes we need to debug draw based on test results, to show why a test succeeded or failed. That’s what the results structs returned from test functions are for.

I like to keep debug draw code close to the related condition tests, if not inside the test function themselves. In my opinion, this makes the code cleaner and easier to read in independent chunks.

It’s perfectly fine if leaving certain debug draw logic inside the test functions themselves makes more sense. However, there are cases where drastically different debug draws are desired at different call sites of the test functions.

My experience tells me that it’s not possible to anticipate how others will use test results for their own debug draws, so I usually put trivial or common debug draw logic inside the test functions, and leave use-case specific debug draw logic to the client code calling the test functions. In the attack example, the `TryAttack`

function body is considered client code that uses the test functions.

I generally follow this pattern:

void TryAttack() { // facing FacingData facingData = PrepareFacingData(); FacingResults facingResults = TestFacing(facingData); if (facingResults.IsFacingValid()) { DebugDrawFacingSuccess(facingResults.GetSuccessInfo()); } else { DebugDrawFacingFailure(facingResults.GetFailureInfo()); return; } // weapon WeaponData weaponData = PrepareWeaponData(); WeaponResults weaponResults = TestWeaponReady(weaponData); if (weaponResults.IsWeaponReady()) { DebugDrawWeaponSuccess(weaponResults.GetSuccessInfo()); } else { DebugDrawWeaponFailure(weaponResults.GetFailureInfo()); return; } // path PathData pathData = PareparePathData(); PathResults pathResults = TestPathClear(pathData); if (pathResults.IsPathClear()) { DebugDrawPathSuccess(pathResults.GetSuccessInfo()); } else { DebugDrawPathFailure(pathResults.GetFailureInfo()); return; } // final logic Attack(); }

This pattern is, again, in the early-out style.

If we use the single-point-of-return style, the code above can turn into this:

void TryAttack() { FacingData facingData = PrepareFacingData(); FacingResults facingResults = TestFacing(facingData); if (facingResults.IsFacingValid()) { DebugDrawFacingSuccess(facingResults.GetSuccessInfo()); WeaponData weaponData = PrepareWeaponData(); WeaponResults weaponResults = TestWeaponReady(weaponData); if (weaponResults.IsWeaponReady()) { DebugDrawWeaponSuccess(weaponResults.GetSuccessInfo()); PathData pathData = PareparePathData(); PathResults pathResults = TestPathClear(pathData); if (pathResults.IsPathClear()) { DebugDrawPathSuccess(pathResults.GetSuccessInfo()); } else { DebugDrawPathFailure(pathResults.GetFailureInfo()); } } else { DebugDrawWeaponFailure(weaponResults.GetFailureInfo()); } } else { DebugDrawFacingFailure(facingResults.GetFailureInfo()); } Attack(); }

The call to `DebugDrawFacingFailure`

is all the way down inside the bottom `else`

block. This is bad in terms of code locality. When I see the call to `DebugDrawFacingFailure`

at the end, I’d have to trace all the way up to find its corresponding condition test.

There are single-point-of-return alternatives that can improve debug draw locality, but it’s still always going to be a challenge to make clean cuts to separate code into chunks that fully contain reference to individual tests. Later test chunks will always need to reference earlier test results.

Sometimes it’s preferable to force debug draws for all test results, even when early tests fail. In this case, we don’t care about the effect of short-circuit evaluation any more.

This is the pattern I follow that adds a flag to force all debug draws, which in turn could be toggled by a debug option:

void TryAttack(bool forceAllDebugDraws) { bool anyTestFailed = false; // facing const FacingResults facingResults = TestFacing(); if (facingResults.IsFacingValid()) { DebugDrawFacingSuccess(facingResults.GetSuccessInfo()); } else { DebugDrawFacingFailure(facingResults.GetFailureInfo()); anyTestFailed = true; if (!forceAllDebugDraws) return; } // weapon const WeaponResults weaponResults = TestWeaponReady(); if (weaponResults.IsWeaponReady()) { DebugDrawWeaponSuccess(weaponResults.GetSuccessInfo()); } else { DebugDrawWeaponFailure(weaponResults.GetFailureInfo()); anyTestFailed = true; if (!forceAllDebugDraws) return; } // path const PathResults pathResults = TestPathClear(); if (pathResults.IsPathClear()) { DebugDrawPathSuccess(pathResults.GetSuccessInfo()); } else { DebugDrawPathFailure(pathResults.GetFailureInfo()); anyTestFailed = true; if (!forceAllDebugDraws) return; } // we'd only get here if total debug draw is forced // don't perform attack if any test has failed if (anyTestFailed) return; // final logic Attack(); }

If the flag to force all debug draws is set to true, all condition tests as well as debug draws will be executed. But the final `Attack`

function call still wouldn’t be reached, because it’s guarded by a flag keeping track of whether any test has failed.

You might have already foreseen that if the number of tests grow large, we can end up having a lot of duplicate code and logic structure, respectively the use of `anyTestFailed`

& `forceAllDebugDraws`

inside the `else`

blocks, and `if`

statements branching into calling success & failure debug draws.

If you’re willing to make a sacrifice to prepare a single master data struct at the start, which is to be passed into all test functions declared with the same signature, plus a master results struct that holds all test info for debug draws, here’s one alternative pattern for your consideration:

// these are C++ function pointers // if you're using C#, think of them as delegates bool (* TestFunc) (const Data &data, Results &results); bool (* DebugDrawFunc) (const Results &results); struct TestSpec { TestFunc m_func; DebugDrawFunc m_debugDrawSuccess; DebugDrawFunc m_debugDrawFailure; }; void TryAttack(bool forceAllDebugDraws) { bool anyTestFailed = false; // define sets of test function and debug draw functions TestFuncSpec testFuncs[] = { { // facing TestFacing, DebugDrawFacingSuccess, DebugDrawFacingFailure }, { // weapon TestWeaponReady, DebugDrawWeaponSuccess, DebugDrawWeaponFailure }, { // path TestPathClear, DebugDrawPathSuccess, DebugDrawPathFailure } } // iterate through each test Data masterData = PrepareMasterData(); Results masterResults; bool anyTestFailed = false; for (TestFuncSpec &spec : testFuncs) { bool success = spec.m_func(masterData, masterResults); if (success) { spec.m_debugDrawSuccess(masterResults); } else { spec.m_debugrRawFailure(masterResults); anyTestFailed = true; if (!forceAllDebugDraws) return; } } if (anyTestFailed) return; Attack(); }

When a new test function and its success & failure debug draw functions are defined, simply add the function set to the `testFuncs`

array. There is only one shared code structure (the range-based `for`

loop) that runs the tests, selects the success or failure debug draw functions to call, and optionally performs early outs.

Finally, if the length of the `TryAttack`

function grows to a point where the purpose of the function is not trivially clear any more. Recall a refactored variation above where all the condition tests are extracted into a separate `CanAttack`

function:

void TryAttack() { if (CanAttack()) Attack(); PostAttackTry(); }

This seems like a good change no matter how the conditional tests are done, as it makes the intention of `TryAttack`

crystal clear to the reader. I’d do this if readability is compromised due to function length.

That’s it! I’ve shared the coding patterns I think help make code readable and debuggable.

Using the early-out style as foundation, I’ve shown how to write debuggable conditions, achieve debug draw locality, and optionally force all debug draws.

I don’t consider these patterns exciting nor groundbreaking, but I find them very useful, and I hope you do, too.

]]>It occurred to me that the entire time I’ve been working with quaternions, I have never read or learned about the derivation of the formula for slerp, spherical linear interpolation. I just learned the final formula and have been using it.

Upon a preliminary search I couldn’t seem to immediately find a straightforward derivation, either (at least not one that fits in the context of game development). So I thought it might be a fun exercise to derive it myself.

As it turns out, it is indeed fun and could probably serve as an interesting trigonometry & vector quiz question!

A quick recap: slerp is an operation that interpolates between two vectors along the shortest arc (in any dimension higher than 1D). It takes as input the two vectors to interpolate between plus an interpolation parameter:

where is the angle between the two vectors:

If the interpolation parameter changes at a constant rate, the angular velocity of the slerp result is also constant. If is set to , it means the slerp result is “the 25% waypoint on the arc from to : the angle between and the slerp result is , and the angle between and the slerp result is .

In the context of game development, slerp is typically used to interpolate between orientations represented by quaternions, which can be expressed as 4D vectors. In this case the shortest arc slerp interpolates across lies on a 4D hypersphere.

As mentioned before, this formula can be used on any vectors in any dimension higher than 1D. So it can also be used to interpolate between two 3D vectors along a sphere, or between two 2D vectors along a circle.

In the context of game development, we almost exclusively work with unit quaternions. So in my derivation, I make the assumption that the vectors we are working with are all unit vectors. The flow of the derivation should be pretty much the same even if the vectors are not unit vectors.

Without further ado, here’s the derivation.

Let be the results of slerp:

And let be the angle between and .

Knowing that the angle between and is , and the angle between and is , we can come up with this figure:

Here’s the strategy. We build a pair of orthogonal axes and from and . Then, we use the parametric circle formula to find :

Since is already a unit vector that convenient lies on the horizontal axis in the figure, let’s just pick . So then can be found by taking away the component in that is parallel to and normalizing the remainder:

Now plug and back into the parametric circle formula:

And voila! We have our slerp formula.

*Edit: Eric Lengyel has pointed out there’s another way to derive the slerp formula using similar triangles, presented in his Mathematics for 3D Game Programming and Computer Graphics, 3rd ed., Section 4.6.3.*

Source files are on GitHub.

Shortcut to sterp implementation.

Shortcut to code used to generate animations in this post.

Slerp, spherical linear interpolation, is an operation that interpolates from one orientation to another, using a rotational axis paired with the smallest angle possible.

Quick note: Jonathan Blow explains here how you should avoid using slerp, if normalized quaternion linear interpolation (nlerp) suffices. Long store short, nlerp is faster but does not maintain constant angular velocity, while slerp is slower but maintains constant angular velocity; use nlerp if you’re interpolating across small angles or you don’t care about constant angular velocity; use slerp if you’re interpolating across large angles and you care about constant angular velocity. But for the sake of using a more commonly known and used building block, the remaining post will only mention slerp. Replacing all following occurrences of slerp with nlerp would not change the validity of this post.

In general, slerp is considered superior over interpolating individual components of Euler angles, as the latter method usually yields orientational sways.

But, sometimes slerp might not be ideal. Look at the image below showing two different orientations of a rod. On the left is one orientation, and on the right is the resulting orientation of rotating around the axis shown as a cyan arrow, where the pivot is at one end of the rod.

If we slerp between the two orientations, this is what we get:

Mathematically, slerp takes the “shortest rotational path”. The quaternion representing the rod’s orientation travels along the shortest arc on a 4D hypersphere. But, given the rod’s elongated appearance, the rod’s moving end seems to be deviating from the shortest arc on a 3D sphere.

My intended effect here is for the rod’s moving end to travel along the shortest arc in 3D, like this:

The difference is more obvious if we compare them side-by-side:

This is where swing-twist decomposition comes in.

Swing-Twist decomposition is an operation that splits a rotation into two concatenated rotations, swing and twist. Given a twist axis, we would like to separate out the portion of a rotation that contributes to the twist around this axis, and what’s left behind is the remaining swing portion.

There are multiple ways to derive the formulas, but this particular one by Michaele Norel seems to be the most elegant and efficient, and it’s the only one I’ve come across that does not involve any use of trigonometry functions. I will first show the formulas now and then paraphrase his proof later:

Given a rotation represented by a quaternion and a twist axis , combine the scalar part from the projection of onto to form a new quaternion:

We want to decompose into a swing component and a twist component. Let the denote the swing component, so we can write . The swing component is then calculated by multiplying with the inverse (conjugate) of :

Beware that and are not yet normalized at this point. It’s a good idea to normalize them before use, as unit quaternions are just cuter.

Below is my code implementation of swing-twist decomposition. Note that it also takes care of the singularity that occurs when the rotation to be decomposed represents a 180-degree rotation.

public static void DecomposeSwingTwist ( Quaternion q, Vector3 twistAxis, out Quaternion swing, out Quaternion twist ) { Vector3 r = new Vector3(q.x, q.y, q.z); // singularity: rotation by 180 degree if (r.sqrMagnitude < MathUtil.Epsilon) { Vector3 rotatedTwistAxis = q * twistAxis; Vector3 swingAxis = Vector3.Cross(twistAxis, rotatedTwistAxis); if (swingAxis.sqrMagnitude > MathUtil.Epsilon) { float swingAngle = Vector3.Angle(twistAxis, rotatedTwistAxis); swing = Quaternion.AngleAxis(swingAngle, swingAxis); } else { // more singularity: // rotation axis parallel to twist axis swing = Quaternion.identity; // no swing } // always twist 180 degree on singularity twist = Quaternion.AngleAxis(180.0f, twistAxis); return; } // meat of swing-twist decomposition Vector3 p = Vector3.Project(r, twistAxis); twist = new Quaternion(p.x, p.y, p.z, q.w); twist = Normalize(twist); swing = q * Quaternion.Inverse(twist); }

Now that we have the means to decompose a rotation into swing and twist components, we need a way to use them to interpolate the rod’s orientation, replacing slerp.

Replacing slerp with the swing and twist components is actually pretty straightforward. Let the and denote the quaternions representing the rod’s two orientations we are interpolating between. Given the interpolation parameter , we use it to find “fractions” of swing and twist components and combine them together. Such fractiona can be obtained by performing slerp from the identity quaternion, , to the individual components.

So we replace:

with:

From the rod example, we choose the twist axis to align with the rod’s longest side. Let’s look at the effect of the individual components and as varies over time below, swing on left and twist on right:

And as we concatenate these two components together, we get a swing-twist interpolation that rotates the rod such that its moving end travels in the shortest arc in 3D. Again, here is a side-by-side comparison of slerp (left) and swing-twist interpolation (right):

I decided to name my swing-twist interpolation function **sterp**. I think it’s cool because it sounds like it belongs to the function family of **lerp** and **slerp**. Here’s to hoping that this name catches on.

And here’s my code implementation:

public static Quaternion Sterp ( Quaternion a, Quaternion b, Vector3 twistAxis, float t ) { Quaternion deltaRotation = b * Quaternion.Inverse(a); Quaternion swingFull; Quaternion twistFull; QuaternionUtil.DecomposeSwingTwist ( deltaRotation, twistAxis, out swingFull, out twistFull ); Quaternion swing = Quaternion.Slerp(Quaternion.identity, swingFull, t); Quaternion twist = Quaternion.Slerp(Quaternion.identity, twistFull, t); return twist * swing; }

Lastly, let’s look at the proof for the swing-twist decomposition formulas. All that needs to be proven is that the swing component does not contribute to any rotation around the twist axis, i.e. the rotational axis of is orthogonal to the twist axis.

Let denote the parallel component of to , which can be obtained by projecting onto :

Let denote the orthogonal component of to :

So the scalar-vector form of becomes:

Using the quaternion multiplication formula, here is the scalar-vector form of the swing quaternion:

Take notice of the vector part of the result:

This is a vector parallel to the rotational axis of . Both and are orthogonal to the twist axis , so we have shown that the rotational axis of is orthogonal to the twist axis. Hence, we have proven that the formulas for and are valid for swing-twist decomposition.

That’s all.

Given a twist axis, I have shown how to decompose a rotation into a swing component and a twist component.

Such decomposition can be used for swing-twist interpolation, an alternative to slerp that interpolates between two orientations, which can be useful if you’d like some point on a rotating object to travel along the shortest arc.

I like to call such interpolation **sterp**.

Sterp is merely an alternative to slerp, not a replacement. Also, slerp is definitely more efficient than sterp. Most of the time slerp should work just fine, but if you find unwanted orientational sway on an object’s moving end, you might want to give sterp a try.

An application of swing-twist decomposition in 2D just came to mind.

If the twist axis is chosen to be orthogonal to the screen, then we can utilize swing-twist decomposition to use the orientation of objects in 3D to drive the rotation of 2D elements in screen space or some other data. The twist component represents exactly the portion of 3D rotation projected onto screen space.

However, in terms of performance, we might be better off just projecting a 3D object’s local axis onto screen space and find the angle between it and a screen space axis. But then again, the swing-twist decomposition approach doesn’t have the singularity the projection approach has when the chosen local axis becomes orthogonal to the screen.

]]>This post is part of my Game Programming Series.

Complete source code for the debug draw utility and Unity scene for generating the demo animation above can be found on GitHub. Here is a shortcut to the debug draw utility class. And here is a shortcut to the shaders.

A couple weeks ago, I documented how I implemented a wireframe Unity debug draw utility using cached mesh pools and vertex shaders.

Recently, I have upgraded the utility to now support various shaded styles, including solid color, flat-shaded, and smooth-shaded. This post is a documentation of my development process and how I solved some of the challenges on the way.

For each mesh rendered in wireframe style, the original mesh factory only needed to generate an array of unique vertices, along with an index array containing the vertex indices in either lines or line strip topology.

To generate a mesh to be rendered in solid color style, I reused the same unique vertex arrays, but the index arrays hadto be changed to contain vertex indices in triangle topology, three indices per triangle.

Once the generation of meshes for solid color style was done, I decided counter-intuitively to first implement the “fancier” smooth-shaded style before the flat-shaded style, because the former was actually an easier incremental change from the solid color style. Taking spheres for example, the vertex array actually still didn’t need to be changed; I just had to create an array of normals that is the exact copy of the vertices. Recall from the previous post that in order to reduce numbers of cached meshes, I offloaded scaling to the vertex shaders and just generated meshes that are unit primitives. The normal of a vertex of a smooth-shaded unit sphere is just conveniently identical to the vertex positional vector.

Figuring out the index arrays for other smooth-shaded primitive meshes wasn’t as straightforward as spheres, but it wasn’t too hard either. I still didn’t need to change most of the vertex arrays and just had to figure out the proper accompanying normal array and index array. Cones were a notable exception, because even with smooth-shaded style, they still have some normal discontinuity along the base edges, which required duplicates of the base edge vertices with different normals.

Finally moving onto the flat-shaded style, most primitives required me to modify the generation of vertex arrays, normal arrays, and index arrays. Arrays of unique vertices no longer worked, because a vertex shared by multiple faces (triangles, quads, circles, etc.) would have a different normal on each face. For each face, a new set of vertices had to be put into the vertex array. Different primitives required slightly different techniques to generate the vertices for each face. Taking spheres for example again, for each longitudinal strip, two triangles connecting to the poles plus two triangles per quad along the strip were needed. The normals were simply computed with cross products of any two non-parallel vectors connecting vertices in each face.

I generally followed this pattern for triangles:

Vector3[] aVert = new Vector3[numVerts]; Vector3[] aNormal = new Vector3[numNormals]; int[] aIndex = new int[numIndices]; int iVert = 0; int iNormal = 0; int iIndex = 0; for (int i = 0; i < numIterations; ++i) { int iTriStart = iVert; aVert[iVert++] = ComputeTriVert0(i); aVert[iVert++] = ComputeTriVert1(i); aVert[iVert++] = ComputeTriVert2(i); Vector3 tri01 = aVert[iTriStart + 1] - aVert[iTriStart]; Vector3 tri02 = aVert[iTriStart + 2] - aVert[iTriStart]; Vector3 triNormal = Vector3.Cross(tri01, tri02).normalized; aNormal[iNormal++] = triNormal; aNormal[iNormal++] = triNormal; aNormal[iNormal++] = triNormal; aIndex[iIndex++] = iTriStart; aIndex[iIndex++] = iTriStart + 1; aIndex[iIndex++] = iTriStart + 2; }

And this pattern for quads:

Vector3[] aVert = new Vector3[numVerts]; Vector3[] aNormal = new Vector3[numNormals]; int[] aIndex = new int[numIndices]; int iVert = 0; int iNormal = 0; int iIndex = 0; for (int i = 0; i < numIterations; ++i) { int iQuadStart = iVert; aVert[iVert++] = ComputeQuadVert0(i); aVert[iVert++] = ComputeQuadVert1(i); aVert[iVert++] = ComputeQuadVert2(i); aVert[iVert++] = ComputeQuadVert3(i); Vector3 quad01 = aVert[iQuadStart + 1] - aVert[iQuadStart]; Vector3 quad02 = aVert[iQuadStart + 2] - aVert[iQuadStart]; Vector3 quadNormal = Vector3.Cross(quad01, quad02).normalized; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aNormal[iNormal++] = quadNormal; aIndex[iIndex++] = iQuadStart; aIndex[iIndex++] = iQuadStart + 1; aIndex[iIndex++] = iQuadStart + 2; aIndex[iIndex++] = iQuadStart; aIndex[iIndex++] = iQuadStart + 2; aIndex[iIndex++] = iQuadStart + 3; }

The positional portion of the vertex shader for all styles is actually identical, so I wanted to find a way to avoid creating an extra set of vertex and fragment shaders just in order to add the logic for normals. Then I found out about Unity’s shader variant feature. By using the `shader_feature`

keyword and `#ifdef`

‘s in the shaders, combined with the `Material.EnableKeyword`

method, I was able to choose from a collection of variants generated from a single master shader at run time for each primitive mesh type. I used the `NORMAL_ON`

keyword for the normal feature.

As shown below, only when the `NORMAL_ON`

keyword is enabled are normals included in the vertex structs.

#pragma shader_feature NORMAL_ON struct appdata { float4 vertex : POSITION; #ifdef NORMAL_ON float3 normal : NORMAL; #endif }; struct v2f { float4 vertex : SV_POSITION; #ifdef NORMAL_ON float3 normal : NORMAL; #endif };

The model-view matrix is used to transform vertex positions from object space into view space, but normals need to be transformed using the inverse transpose of the model-view matrix. Since the scaling is offloaded to the shader, I needed to fold in the scaling portion of the inverse transpose of the model-view matrix myself.

v2f vert (appdata v) { v2f o; // ... #ifdef NORMAL_ON float4x4 scaleInverseTranspose = float4x4 ( 1.0f / _Dimensions.x, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f / _Dimensions.y, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f / _Dimensions.z, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f ); float4x4 m = mul(UNITY_MATRIX_IT_MV, scaleInverseTranspose); o.normal = mul(m, float4(v.normal, 0.0f)).xyz; #endif return o; }

I also used the `shader_feature`

keyword to optionally activate the “cap shift/scaling” logic for cylinders and capsules. Recall from the previous post that in order not to generate a mesh for each possible height, only unit-height cylinder and capsule meshes are generated, and the caps are shifted towards the X-Z plane, scaled, and then shifted back to the final height. I used the `CAP_SHIFT_SCALE`

keyword for this feature.

#pragma shader_feature CAP_SHIFT_SCALE // (x, y, z) == (dimensionX, dimensionY, dimensionZ) // w == capShiftScale // shifts 0.5 towards X-Z plane, scale by dimensions, // and then shoft back 0.5 * capShiftScale) float4 _Dimensions; v2f vert (appdata v) { v2f o; #ifdef CAP_SHIFT_SCALE const float ySign = sign(v.vertex.y); v.vertex.y -= ySign * 0.5f; #endif v.vertex.xyz *= _Dimensions.xyz; #ifdef CAP_SHIFT_SCALE v.vertex.y += ySign * 0.5f * _Dimensions.w; #endif o.vertex = UnityObjectToClipPos(v.vertex); // ... return o; }

I noticed some Z-fighting between the two styles when I drew the same meshes twice, once in wireframe style and once in shaded style. It was actually an easy fix. I just added a small Z-bias to make sure the wireframe lines are always drawn in front of the shaded pixels.

float _ZBias; v2f vert (appdata v) { v2f o; // ... o.vertex = UnityObjectToClipPos(v.vertex); o.vertex.z += _ZBias; // ... return 0; }

And finally here’s the fragment shader. It really doesn’t contain anything out of the ordinary, except that it remaps the vertex brightness from (0.0, 1.0) to (0.3, 1.0), because I really don’t like completely black pixels.

fixed4 frag (v2f i) : SV_Target { fixed4 color = _Color; #ifdef NORMAL_ON i.normal = normalize(i.normal); color.rgb *= 0.7f * i.normal.z + 0.3f; // darkest at 0.3f #endif return color; }

That’s it! I am pretty satisfied with the current Unity debug draw utility. It’s also easy to combine primitives to make more interesting shapes, such as the arrows shown in the demo animation above.

Potentially, the meshes for flat-shaded and smooth-shaded styles, generated from the mesh factory, can be used to implement a gizmo utility. But I’ll probably only do it when I really need it.

Stay tuned for more documentation of my future venture into Unity land.

Until next time!

]]>This post is part of my Game Programming Series.

Complete source code for the debug draw utility and Unity scene for generating the demo animation above can be found on GitHub. Here is a shortcut to the debug draw utility class. And here is a shortcut to the shaders.

I’ve recently started picking up Unity, and quickly found out that the only easily accessible debug draw function is `Debug.DrawLine`

, unless I was mistaken (in which case please do let me know).

So I thought it was a good opportunity to familiarize myself with Unity’s environment and a great exercise to implement a debug draw utility that draws various primitives, including rectangles, boxes, spheres, cylinders, and capsules. This post is essentially a quick documentation of what I have done and problems I’ve encountered.

As my first iteration, I took the naive approach and just wrote functions that internally make a bunch of calls to `Debug.DrawLine`

. You can see such first attempt here in the history.

The majority of the time spent was pretty much figuring out the right math, so nothing special. I guess the only thing worth pointing out is how I arranged the loops in the functions for spheres and capsules. My first instinct was to draw “from top to bottom”, looping from one pole to the other and constructing rings of line segments along the way, with special cases at the poles handled outside the loop. However, I didn’t like the idea of part of the math outside the loop, as it didn’t feel elegant enough (note: this is just my personal preference). So I came up with a different way of doing it, where I “assemble identical longitudinal pieces” around the central axis that connects the poles. In this case, there are no special cases outside the loop body.

After my first attempt, I got curious as to how other people debug draw spheres in Unity, and I came across this gist. This is when it occurred to me that I can get better performance by caching the mathematical results into meshes, and then simply draw the cached meshes, as well as offloading some of the work onto the GPU with vertex shaders.

There are a bunch of primitives in my debug draw utility, so I won’t enumerate every single one of them. I’ll just use the capsule as an example.

I didn’t want to create a new mesh for every single combination of height, radius, latitudinal segments, and longitudinal segments, because you can have so many different combinations of floats that it’s impractical. Instead, I used just the latitudinal and longitudinal segments to generate a key for each cached mesh, and modify the vertices in the vertex shader with height and radius as shader input.

private static Dictionary<int, Mesh> s_meshPool; private static Material s_material; private static MaterialPropertyBlock s_matProperties; public static void DrawCapsule(...) { if (latSegments <= 0 || longSegments <= 1) return; if (s_meshPool == null) s_meshPool = new Dictionary<int, Mesh>(); int meshKey = (latSegments << 16 ^ longSegments); Mesh mesh; if (!s_meshPool.TryGetValue(meshKey, out mesh)) { mesh = new Mesh(); // ... s_meshPool.Add(meshKey, mesh); } if (s_material == null) { s_material = new Material(Shader.Find("CjLib/CapsuleWireframe")); } if (s_matProperties == null) s_matProperties = new MaterialPropertyBlock(); s_matProperties.SetColor("_Color", color); s_matProperties.SetVector("_Dimensions", new Vector4(height, radius)); Graphics.DrawMesh(mesh, center, rotation, s_material, 0, null, 0, s_matProperties); }

And below is the vertex shader. I basically shift each cap towards the center, scale the vertices using the radius, and push them back out using the height. I used the `sign`

function to effectively branch on which side of the XZ plane the vertices are on, without actually introducing a code branch in the shader.

float4 _Dimensions; // (height, radius, *, *) v2f vert (appdata v) { v2f o; float ySign = sign(v.vertex.y); v.vertex.y -= ySign * 0.5f; v.vertex.xyz *= _Dimensions.y; v.vertex.y += ySign * 0.5f * _Dimensions.x; o.vertex = UnityObjectToClipPos(v.vertex); return o; }

However, I spent 2 hours past midnight just scratching my head, trying to figure out why some of my debug draw meshes pop around as I shift and rotate the camera. It was as if the positional pops are dependent on the camera position and orientation, which was quite bizarre. It finally occurred to me that I might not have been consistently getting vertex positions in object space in the vertex shader, and based on that assumption I found this post that confirmed my suspicion.

Basically, Unity has draw call batching turned on by default, so it inconsistently passed in vertex positions to vertex shaders in either object space or world space. It’s actually stated in Unity’s documentation here under the not-so-obvious `DisableBatching`

tag section, that vertex shaders operating in object space won’t work reliably if draw call batching is on.

Although the process of figuring out what went wrong was annoying, the fix was luckily quite simple: just disable draw call batching in the vertex shaders.

Tags { "DisableBatching" = "true" }

That’s it! I hope you find this post interesting. I will likely continue to document my ventures into the Unity world.

]]>The source files for generating the animations in this post are on GitHub.

本文之中文翻譯在此 (by Wayne Chen)

Timeslicing is a very useful technique to improve the performance of batched algorithms (multiple instances of the same algorithm): instead of running all instances of algorithms in a single frame, spread them across multiple frames.

For instance, if you have 100 NPCs in a game level, you typically don’t need to have every one of them make a decision in every single frame; having 50 NPCs make decisions in each frame would effectively reduce the decision performance overhead by 50%, 25 NPCs by 75%, and 20 NPCs by 80%.

Note that I said timeslicing the decisions, __not__ the whole update logic of the NPCs. In every frame, we’d still want to animate every NPC, or at least the ones closer and more noticeable to the player, based on the **latest decision**. The extra animation layer can usually hide the slight latency in the timesliced decision layer.

Also bear in mind that I will not be discussing how to finish a single algorithm across multiple frames, which is another form of timeslicing that is not within the scope of this post. Rather, this post will focus on spreading multiple instances of the same algorithm across multiple frames, where each instance is small enough to fit in a single frame.

Such timeslicing technique applies to batched algorithms that are not hyper-sensitive to latency. If even a single frame of latency is critical to certain batched algorithms, it’s probably not a good idea to timeslice them.

In this post, I’d like to cover:

- An example that involves running multiple instances of a simple algorithm in batch.
- How to timeslice such batched algorithms.
- A categorization for timeslicing based on the timing of input and output.
- A sample implementation of a timeslicer utility class.
- Finally, how threads can be brought into the mix.

The example I’m going to use is a simple logic that orients NPCs to face a target. Each NPC’s decision layer computes the desired orientation to face the target, and the animation layer tries to rotate the NPCs to match their desired orientation, capped at a maximum angular speed.

First, let’s see an animated illustration of what it might look like if this algorithm is run for every NPC in every frame (Update All).

The moving circle is the target, the black pointers represent NPCs and their orientation, and the red indicators represent the NPCs’ desired orientation.

And the code looks something like this:

void NpcManager::UpdateFrame(float dt) { for (Npc &npc : m_npcs) { npc.UpdateDesiredOrientation(target); npc.Animate(dt); } } void Npc::UpdateDesiredOrientation(const Object &target) { m_desiredOrientation = LookAt(target); } void Npc::Animate(float dt) { Rotation delta = Diff(m_desiredOrientation, m_currentOrientation); delta = Limit(delta, m_maxAngularSpeed); m_currentOrientation = Apply(m_currentOrientation, delta); }

As mentioned above, you typically don’t need to update all the NPCs’ decisions in one frame. We can achieve rudimentary timeslicing like this:

void NpcManager::UpdateFrame(float dt) { const unsigned kMaxUpdates = 4; unsigned npcUpdate = 0; while (npcUpdated < m_numNpcs && npcUpdated < kMaxUpdates) { m_aNpc[m_iNpcWalker].UpdateDesiredOrientation(target); m_iNpcWalker = (m_iNpcWalker + 1) % m_numNpc; ++npcUpdated; } for (Npc &npc : m_npcs) { npc.Animate(dt); } }

This straightforward approach could be enough. However, sometimes you just need more control over the timing of input and output. Using the more involved timeslicing logic presented below, you can have a choice of different timing of input and output to suit specific needs.

Before going any further, let’s take a look at the terminology that will be used throughout this post.

- Completing a
**batch**means finishing running the decision logic once for each NPC. - A
**job**represents the work to run an instance of decision logic for an NPC. - The
**input**is the data required to run a job. - The
**output**is the results from a job after it’s finished

Now the timeslicing logic.

Here are the steps of one way to timeslice batched algorithms. It’s probably not the absolute best in terms of efficiency or memory usage, but I find it logically clear and easy to maintain (which also means it’s good for presentational purposes). So unless you absolutely need to micro-optimize, I wouldn’t worry about it too much.

- Start a new batch.
- Figure out all the jobs that need to be done. Associate each job with a unique
**key**that can be used to infer the required input for the job. - For each job, prepare an instance of job
**parameters**that is a collection of its key, input, and output. - Start and finish up to a max number of jobs per frame.
- Depending on the timing of output (more on this later),
**save**the**results**of a job, including the job’s output and its associated key, by pushing it to a**ring buffer**that represents the**history**of job results. The rest of the game logic to query latest results by key. - After all jobs are finished, the batch is finished. Rinse and repeat.

One advantage of looking up output by key is that different timesliced systems can work with each other just fine, even if they reference each other’s output. As far as a system is concerned, it’s looking up output from another system using a key, and the other system is reporting back the latest valid output available associated with the given key. Sort of like a mini database.

In our example, since each job is associated with an NPC, it seems fitting to use the NPCs as individual keys.

Next, here’s a categorization of timeslicing, based on the timing of reading input and saving output.

NOTE: The use of words “synchronous” and “asynchronous” here has nothing to do with multi-threading. The words are only used to distinguish the timing of operations. Everything presented before the “Threads” section later in this post is single-threaded.

**Asynchronous Input**: input is read by a job only when it’s started.**Synchronous Input**: input is read by all jobs when a new batch starts.**Asynchronous Output**: a job’s output is saved as soon as the job finishes.**Synchronous Output**: output of all jobs is saved when a batch finishes.

A ring buffer is used so that the rest of the game logic can be completely agnostic to the timing, and assume that the output (queried by key) is the latest.

Mixing and matching different timing of input and output gives 4 combinations. Async input / async output (AIAO), sync input / sync output (SISO), sync input / async output (SIAO), and async input / sync output (AISO). Let’s look at them one by one.

For demonstrational purposes, all animated illustrations below reflect a setup where only one job is started in each frame. The number should be set higher in a real game if it is introducing unacceptable latency.

For our specific example of NPCs turning to face the target, the AIAO combination probably makes the most sense. The input is read only when the job starts, so the job has the latest position of the target. The output is saved as soon as the job finishes with results of NPC’s desired orientation, so the NPC’s animation layer can react to the latest desired orientation immediately.

Here’s an animated illustration of what it could look like if we run the jobs at 10Hz (10 NPC jobs per second).

And here’s what it looks like if done at 30Hz.

You can see each that NPC waits until its job starts before getting the latest position of the target, and updates its desired orientation as soon as the job finishes.

For cases where asynchronous input from the AIAO combination as shown above is causing unwanted staggering, yet NPCs are still desired to react as soon as each of their job finishes, we can use the SIAO combination.

Here’s the 10Hz version.

And here’s the 30Hz version.

Note that when each job starts, it’s using the same target position as input, which has been synchronized at the start of each batch, while the output is saved for immediate NPC reaction as soon as each job finishes.

This is effectively the same as the first “basic first attempt” at timeslicing shown above.

The SISO combination is probably best explained by looking at the animated illustrations first. In order, below are the 10Hz and 30Hz versions of this combination.

It’s basically a “laggy” version of the very first animated illustration where every NPC is fully updated in every frame. All job input is synchronized upon batch start, and all output is saved out upon batch finish. Essentially this is kind of a “double buffer”, where the latest results aren’t reflected until all jobs in a batch are finished. For this reason, the history ring buffer must be **at least twice as large** as the max batch size for combinations with **synchronized output** to work properly.

The SISO combination is probably not ideal for our specific example. However, for cases like updating influence maps, heat maps, or any kind of game space analysis, the SISO combination could prove useful.

To be frank, I can’t think of a proper scenario to justify the use of the AISO combination. It’s only included here for comprehensive purposes. See the animated illustrations below in the order of the 10Hz version and 30Hz version. If you can think of a case where the AISO combination is a superior choice to the other three, please share your ideas in the comments or email me. I’d really like to know.

Now that we’ve seen all four combinations of timeslicing, it’s time to look at a sample implementation that does exactly what has been shown above.

Before going straight to the core timeslicing logic, let’s first look at how it plugs into the sample NPC code we saw earlier.

The timeslicer utility class allows users to provide a function that sets up keys for a new batch (writes to an array and returns new batch size), a function to set up input for job (writes to input based on key), and a function that is the logic to be timesliced (writes to output based on key and input).

class NpcManager { private: struct NpcJobInput { Point m_targetPos; }; struct NpcJobOutput { Orientation m_desiredOrientation; }; // timeslicing utility class Timeslicer < Npc*, // key NpcJobInput, // input NpcJobOutput, // output kMaxNpcs, // max batch size false, // sync input flag (false = async) false // sync output flag (false = async) > m_npcTimeslicer; // ...other stuff }; void NpcManager::Init() { // set up keys for new batch auto newBatchFunc = [this](Npc **aKey) unsigned { for (unsigned i = 0; i < m_numNpcs; ++i) { aKey[i] = GetNpc(i); } return m_numNpcs; }; // set up input for job auto setUpInputFunc = [this](Npc *pNpc, Input *pInput)->void { pInput->m_targetPos = GetTargetPosition(pNpc); } // logic to be timesliced auto jobFunc = [this](Npc *pNpc, const Input &input, Output *pOutput)->void { pOutput->m_desiredOrientation = LookAt(pNpc, input.m_targetPosition); }; // initialize timeslicer m_npcTimeslicer.Init ( newBatchFunc, setUpInputFunc, jobFunc ); } void NpcManager::UpdateFrame(float dt) { // timeslice decision logic m_timeslicer.Update(maxJobsPerFrame); // animate all NPCs based on latest decision results for (Npc &npc : m_npcs) { Output output; if (!m_timeSlicer.GetOutput(&npc, &output)) { npc.SetDesiredOrientation(output.m_desiredOrientation); } npc.Animate(dt); } }

And below is the timeslicer utility class in its entirety.

template < typename Input, typename Output, typename Key, unsigned kMaxBatchSize, bool kSyncInput, bool kSyncOutput > class Timeslicer { private: struct JobParams { Key m_key; Input m_input; Output m_output; }; struct JobResults { Key m_key; Output m_output; }; // number of jobs in current batch unsigned m_batchSize; // keep track of jobs in current frame unsigned m_iJobBegin; unsigned m_iJobEnd; // required to start jobs JobParams m_aJobParams[kMaxBatchSize]; // keep track of job results (statically allocated) static const unsigned kMaxHistorySize = kSyncOutput ? 2 * kMaxBatchSize // more on this later : kMaxBatchSize; typedef RingBuffer<JobResults, kMaxHistorySize> History; History m_history; // set up keys for new batch // (number of keys = batch size = jobs per batch) typedef std::function<unsigned (Key *)> NewBatchFunc; NewBatchFunc m_newBatchFunc; // set up input for job typedef std::function<void (Key, Input *)> SetUpInputFunc; SetUpInputFunc m_setUpInputFunc; // logic to be timesliced // (takes key and input, writes output) typedef std::function<void (Key, const Input &, Output *)> JobFunc; JobFunc m_jobFunc; public: void Init ( NewBatchFunc newBatchFunc, SetUpInputFunc setUpInputFunc, JobFunc jobFunc ) { m_newBatchFunc = newBatchFunc; m_setUpInputFunc = setUpInputFunc; m_jobFunc = jobFunc; Reset(); } void Reset() { m_batchSize = 0; m_iJobBegin = 0; m_iJobEnd = 0; } bool GetOutput(Key key, Output *pOutput) const { // iterate from newest history (last queued output) for (const JobResults &results : m_history.Reverse()) { if (key == results.m_key) { *pOutput = results.m_output; return true; } } return false; } void Update(unsigned maxJobsPerUpdate) { TryStartNewBatch(); StartJobs(maxJobsPerUpdate); FinishJobs(); } private: void TryStartNewBatch() { if (m_iJobBegin == m_batchSize) { // synchronous output saved on batch finish if (kSyncOutput) { for (unsigned i = 0; i < m_batchSize; ++i) { const JobParams ¶ms = m_aJobParams[i]; SaveResults(params); } } Reset(); Key aKey[kMaxBatchSize]; m_batchSize = m_newBatchFunc(aKey); for (unsigned i = 0; i < m_batchSize; ++i) { JobParams ¶ms = m_aJobParams[i]; params.m_key = aKey[i]; // synchronous input set up on new batch start if (kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } } } } void StartJobs(unsigned maxJobsPerUpdate) { unsigned numJobsStarted = 0; while (m_iJobEnd < m_batchSize && numJobsStarted < maxJobsPerUpdate) { JobParams ¶ms = m_aJobParams[m_iJobEnd]; // asynchronous input set up on job start if (!kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } m_jobFunc ( params.m_key, params.m_input, ¶ms.m_output ); ++m_iJobEnd; ++numJobsStarted; } } void FinishJobs() { while (m_iJobBegin < m_iJobEnd) { const JobParams ¶ms = m_aJobParams[m_iJobBegin++]; // asynchronous output saved on job finish if (!kSyncOutput) { SaveResults(params); } } } void SaveResults(const JobParams ¶ms) { JobResults results; results.m_key = params.m_key; results.m_output = params.m_output; if (m_history.IsFull()) { m_history.Dequeue(); } m_history.Enqueue(results); } };

If your game engine allows multi-threading, we can go one step further by offloading jobs to threads. Starting a job now creates a thread that runs the timesliced logic, and finishing a job now waits for the thread to finish. We need to use read/write locks to make sure the timeslicer plays nicely with the rest of game logic. Required changes to code are highlighted below.

class Timeslicer { // ...unchanged code omitted RwLock m_lock; struct JobParams { std::thread m_thread; Key m_key; Input m_input; Output m_output; }; bool GetOutput(Key key, Output *pOutput) const { ReadAutoLock readLock(m_lock); // iterate from newest history (last queued output) for (const JobResults &results : m_history.Reverse()) { if (key == results.m_key) { *pOutput = results.m_output; return true; } } return false; } void TryStartNewBatch() { WriteAutoLock writeLock(m_lock); if (m_iJobBegin == m_batchSize) { // synchronous output saved on batch finish if (kSyncOutput) { for (unsigned i = 0; i < m_batchSize; ++i) { const JobParams ¶ms = m_aJobParams[i]; SaveResults(params); } } Reset(); Key aKey[kMaxBatchSize]; m_batchSize = m_newBatchFunc(aKey); for (unsigned i = 0; i < m_batchSize; ++i) { JobParams ¶ms = m_aJobParams[i]; params.m_key = aKey[i]; // synchronous input set up on new batch start if (kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } } } } void StartJobs(unsigned maxJobsPerUpdate) { WriteAutoLock writeLock(m_lock); unsigned numJobsStarted = 0; while (m_iJobEnd < m_batchSize && numJobsStarted < maxJobsPerUpdate) { JobParams ¶ms = m_aJobParams[m_iJobEnd]; // asynchronous input set up on job start if (!kSyncInput) { m_setUpInputFunc(params.m_key, ¶ms.m_input); } params.m_thread = std::thread([¶ms]()->void { m_jobFunc ( params.m_key, params.m_input, ¶ms.m_output ); }); ++m_iJobEnd; ++numJobsStarted; } } void FinishJobs() { WriteAutoLock writeLock(m_lock); while (m_iJobBegin < m_iJobEnd) { JobParams ¶ms = m_aJobParams[m_iJobBegin++]; params.m_thread.join(); // asynchronous output saved on job finish if (!kSyncOutput) { SaveResults(params); } } } };

If your game can afford to have one more frame of latency and you don’t want the timeslicer squatting a thread, you can tweak the update function a bit, where jobs are started at the end of update in the current frame, and are finished at the beginning of update in the next frame.

void TimeSlicer::Update(unsigned maxJobsPerUpdate) { FinishJobs(); TryStartNewBatch(); StartJobs(maxJobsPerUpdate); }

That’s it! We’ve seen how timeslicing batched algorithms can help with game performance, as well as the 4 combinations of input and output with different timing, each having its own use (well, maybe not the last one). We’ve also seen how the timeslicing logic can be further adapted to make use of threads.

I hope you find this useful.

]]>