Hacking Super Mario 64 using covering spaces (+ hyperbolic geometry)

Abstract

We explore covering spaces — spaces that locally look like the “repetitions” of a base space. A key insight is, that we can gain hyperbolic geometry by looking at the universal cover of a surface of genus \(g \geq 2\).

Covering spaces are a fundamental concept in topology. Some typical examples can be seen in this previous post.

Definition 1 A covering space of a topological space \(X\) is a topological space \(C\) together with a continuous surjective map \(p: C \to X\) such that for every point \(x \in X\), there exists an open neighborhood \(U\) of \(x\) such that \(p^{-1}(U) \cong F \times U\) for some discrete set \(F\) (called the fiber over \(x\)), and the map \(p\) restricted to each component of \(p^{-1}(U)\) is a homeomorphism onto \(U\).

Note

Definition 2 A homeomorphism between topological spaces \(X\) and \(Y\) is a bijective map \(f: X \to Y\) such that both \(f\) and its inverse \(f^{-1}: Y \to X\) are continuous. Explicitly:

\(f\) is a bijection,
\(f\) is continuous (preimages of open sets are open),
\(f^{-1}\) is continuous (images of open sets are open, i.e. \(f\) is an open map).

If such an \(f\) exists, \(X\) and \(Y\) are called homeomorphic, written \(X \cong Y\).

Homeomorphisms are precisely the isomorphisms in the category \(\mathbf{Top}\) of topological spaces and continuous maps.

Example of a homeomorphism between a doughnut and a coffee cup

Definition 3 A universal cover of a topological space \(X\) is a covering space \(\tilde{X}\) of \(X\) that is simply connected, meaning that it has no nontrivial loops. The universal cover is unique up to homeomorphism and serves as a “universal” object in the category of covering spaces of \(X\).

The universal cover of a doughnut

A filled torus (a doughnut) is a 3-manifold homeomorphic to \(S^1 \times D^2\), where \(D^2\) is the 2-dimensional disk. There exists a deformation retract from the doughnut to a circle, so the fundamental group of the doughnut is \(\pi_1(S^1 \times D^2) \cong \mathbb{Z}\).

But what does any of that even mean? Let’s visualize the universal cover of the doughnut. Let us start with Bob, who lives in the following little world:

When Bob drives along the road past the stop sign, he will eventually return to the same point at which he started. However, he would reach invisible walls, it he intended to walk along the grass, or jump to the sky. The road is the only way to get around, and it loops back on itself.

Topologically, Bobs world is a doughnut, and the road is a loop around the hole of the doughnut. If we were to stretch it (don’t worry, this wont harm Bob, he does not even notice!), we would get the following picture:

These kinds of worlds typically occur in video games, and famously the ideas which I will elaborate on in this post were used in the Super Mario 64 community to “hack” the game to achieve various speedrunning goals:

(This is where the famous “but first, we will need to talk about parallel universes” quote comes from. In the SM64 community, covering spaces are known as “parallel universes”). Wheather intentional (like in Pacman) or unintentional (like in Super Mario 64, due to casting floating point numbers to short ints), it is often the case that walking in a straight line long enough will eventually lead you back to the same point.

In the case of SM64 however, only the collision detection code is affected by this floating point arithmetic, but not the rendering engine. So there actually are some differences between the “parallel universes” in SM64. In particular, when Mario moves to a different parallel universe, the same collisions are still detected, although he is rendered as floating in an empty space. We can encode this also in Bobs world:

Lets assume, we would double, triple, quadruple, etc. Bobs world and stack these copies on top of each other. Bob would never notice, as long as each copy is just a “shallow” copy, meaning that each movement in the fundamental domain (the original world) is mirrored in the copies. In this case, we would have a covering space of Bobs world, and the original world would be the base space.

You could now imagine if we would do the same for SM64. Only there, we would not render any of the terrain, but only Mario himself for the copies.

If we were to keep stacking these copies, we would get the following picture:

Bob’s world stacked \infty times — Bob’s world stacked \(\infty\) times

The infinitely stacked world \(D^2 \times \mathbb{R}^2\) is the universal cover of Bobs world \(S^1 \times D^2\), as it is simply connected and covers Bobs world (see the definition of universal cover above).

Caution

I sometimes changed the ground texture and was too lazy to rerender all the images, so please excuse this inconsistency in the images.

Further elaboration regarding Super Mario 64

The universal covering of Super Mario 64s maps[^1] is actually what Bismuth describes in his video: “Super Mario 64 Tool-assisted speedrun world record explained”. The worlds of SM64 or rather the maps in Super Mario 64, are all homeomorphic to \(T^3\) (meaning they are pretty much the same, see the definition of homeomorphisms). The \(T^3\) is the 3-dimensional torus, which is the product of three circles. Look at this short clip from his video:

Warning

Because Super Mario 64 maps still use only IEEE‑754 32bit floating point numbers for positions, they are only a covering space, not the universal covering space of the collision detection space, which uses short ints I will elaborate below.

(Video by Bismuth) The world of Super Mario 64 is more or less a 3-dimensional torus \(T^3\). Marios position is stored as a float, but cast down to a short for the collision detection, meaning that only values up between \(-32768\) and \(32767\) are actually detected as different positions for the collision detection. Therefor marios position detection is calculated in \(B:= (\mathbb{R}/65536\mathbb{Z})^3\), and his actual position is calculated in \(P:=(\mathbb{R} / (1.17549435 \times 10^{-38})\mathbb{Z})^3\), here we still need to mod out, because of Nintendo 64 IEEE‑754 floating point arithmetic.

Casting from float to short gives rise to the retraction \(\rho: P \to B\) and its section \(\sigma: B \to P\) which lifts \((0,0,0)\) to \((0,0,0)\), is exploted in SM64 speed running. In particular \(P\) is a covering space of \(B\) with fiber \(F:= \ker(\rho)\), which are all points, where Mario transitions from one “parallel universe” to the next.

In SM64 this covering space structure is used by giving Mario extremely high velocity \(v\) (a tangent vector at Marios position in \(P\)) through a different glitch called “backwards longjump”. The exponential map is then used to move Mario along the geodesic \(\exp_v(t)\) for \(t \in \mathbb{R}\). But due to hardware limitations, only discrete steps \(t_1, \dots, t_k\) for some \(k \in \mathbb{N}\) of this geodesic are actually calculated for collision detection. More on how this is related to the exponential map of Lie groups can be found in my previous post. Throughout this post, I will assume the Lie group of SM64 to have its unit at the position \(p\).

By carefully choosing Marios position \(p in P\) and velocity \(v in T_p P\), the SM64 community was able to reach a desired positions \(q\) in \(B\) up to collision detection (for reaching some door, collecting a star, etc.). They did this by checking which \(t_1, \dots, t_k\) SM64 actually uses for calculations and then making the right choices, so that \(\rho(\exp_v(t_i)) = q\) for some \(i\) and for all other \(j \neq i\) \(\rho(\exp_v(t_j))\) is not a position that would trigger a collision detection with negative consequences (like resetting \(v\)).

Schematic of picking the correct v as shown in Bismuths Video — Schematic of picking the correct \(v\) as shown in Bismuths Video

Hyperbolic geometry from the universal cover \(\widetilde{S^1 \vee S^1}\)

In the previous example we considered a space which has only one road looping back on itself. The number of times you would walk around the road to get back to the “same” point (or an equivalent point in a different copy) can be encoded using this “winding number” trick:

Assume Bob had a dog “Snoopy” on a leash and it walks along the road, while Bob was standing still. If the dog follows the road once and comes back to Bob, this results in the leash being wrapped around the hole in his space: In other words, Bob would need to walk around the hole once to untangle it.

In fact, we could encode in which copy of Bobs world Snoopy is, by counting how many times the leash is wrapped around the hole.

This “wrapping a leash around the hole” action that Snoopy can perform is actually a group action of \(\mathbb{Z}\) on the base space (Bobs space): For each integer \(n \in \mathbb{Z}\), Snoopy can wrap the leash around the hole \(n\) times, and \(n-m\) corresponds to Snoopy walking around the hole \(n\) times clockwise and then \(m\) times counterclockwise.

Note

Definition 4 (Definition: Fundamental Group) Let \(X\) be a topological space and \(x_0 \in X\) a basepoint. A loop based at \(x_0\) is a continuous map \(\gamma: [0,1] \to X\) with \(\gamma(0) = \gamma(1) = x_0\). Two loops \(\gamma, \delta\) are homotopic relative to \(x_0\) (written \(\gamma \simeq \delta\)) if there exists a continuous map \(H: [0,1] \times [0,1] \to X\) such that \[H(s,0) = \gamma(s), \quad H(s,1) = \delta(s), \quad H(0,t) = H(1,t) = x_0\] for all \(s,t \in [0,1]\). This is an equivalence relation; denote the equivalence class of \(\gamma\) by \([\gamma]\).

The fundamental group of \(X\) at \(x_0\) is \[\pi_1(X, x_0) \;:=\; \bigl\{[\gamma] \mid \gamma \text{ is a loop based at } x_0\bigr\}\] equipped with the group operation of concatenation: \[[\gamma] \cdot [\delta] := [\gamma * \delta], \qquad (\gamma * \delta)(s) := \begin{cases} \gamma(2s) & s \in [0,\tfrac{1}{2}] \\ \delta(2s-1) & s \in [\tfrac{1}{2},1] \end{cases}\] The identity element is the class of the constant loop \([c_{x_0}]\), and the inverse of \([\gamma]\) is \([\bar\gamma]\) where \(\bar\gamma(s) := \gamma(1-s)\).

If \(X\) is path-connected, the isomorphism type of \(\pi_1(X,x_0)\) is independent of the basepoint \(x_0\), and we write simply \(\pi_1(X)\).

If we would think of the invisible walls in Bobs world as actual walls, say by a street lamp, we would get the following picture:

Snoopy being tangled -1 times around the hole/lamp — Snoopy being tangled \(-1\) times around the hole/lamp

Snoopy being tangled 0 times around the hole/lamp — Snoopy being tangled \(0\) times around the hole/lamp

Snoopy being tangled 1+1=2 times around the hole/lamp — Snoopy being tangled \(1+1=2\) times around the hole/lamp

So we have that one line removed in Bobs space results in an infinite (\(\mathbb{Z}\)) amount of copies in the universal cover, but what is we would remove two lines?

A world with two lines removed. Now there are two ways (red and blue) for a leash to get tangled up and “all their combinations” (there are two generators of the fundamental group of this space)

Let’s “unfold” this world again, like we did with the torus and the world, where moving through the wall along the road “teleported” Bob back to the other side. I will place a small house and some water into this barren world, so bob can have a nice place to live:

Unfolding the space with two lines removed (Bobs home) to a double doughnut.

Another view of the above unfolding (i accidentally forgot to transform the water though).

Warning

This is not the space where Super Mario 64 takes place! SM64 would be the space \(T^3\) (a 3-dimensional torus). We would get this space by removing putting portals from the left to the right, from the top to the bottom and from the floor to the ceiling in a box, which was Bobs road-trip world from before!

In this double dougnut world, the fundamental group would be the free group on two generators:

And if we form the universal cover of this space, we get the following hyperbolic space (notice that I shrinked copies the the base space/“parallel universes”), because hyperbolic space does not fit into the Euclidean plane:

The universal cover of the double doughnut is a hyperbolic space

The universal cover of the double doughnut is a hyperbolic space, where we only shrink the xy plane, but not the height

Definition: Hyperbolic Space

The \(n\)-dimensional hyperbolic space \(\mathbb{H}^n\) is the unique (up to isometry) simply connected, complete Riemannian manifold of constant sectional curvature \(K = -1\).

Two standard models make this concrete:

Poincaré half-space model. Take the open upper half-space \[\mathbb{H}^n := \{(x_1,\dots,x_n)\in\mathbb{R}^n \mid x_n > 0\}\] equipped with the Riemannian metric whose components on the coordinate tangent vectors are \[g_{ij}(p) \;=\; g_p\!\left(\frac{\partial}{\partial x_i}\bigg|_p,\; \frac{\partial}{\partial x_j}\bigg|_p\right) \;=\; \frac{1}{p_n^2}\,\delta_{ij}.\] Geodesics are semicircles (or rays) orthogonal to the boundary hyperplane \(\{x_n = 0\}\).

Poincaré disk model. Take the open unit ball \(\mathbb{D}^n = \{x\in\mathbb{R}^n \mid \|x\|<1\}\) with metric \[g_{ij}(p) \;=\; g_p\!\left(\frac{\partial}{\partial x^i}\bigg|_p,\; \frac{\partial}{\partial x^j}\bigg|_p\right) \;=\; \frac{4}{\bigl(1-\|p\|^2\bigr)^2}\,\delta_{ij}.\] This model is conformal: angles between curves equal their Euclidean counterparts.

Each point \(p \in \mathbb{H}^n\) has tangent vectors \(\frac{\partial}{\partial x^i} in T_p M\) (which we write as the partial derivatives) at \(p\) given local coordinates (i.e. a basis \(\text{span}\{x^1,\dots,x^n\} = T_p M\)). The collection \(\bigl\{\frac{\partial}{\partial x^1}\big|_p,\dots,\frac{\partial}{\partial x^n}\big|_p\bigr\}\) forms a basis of \(T_pM\).

A Riemannian metric on a smooth manifold \(M\) is a family of inner products \[g_p : T_pM \times T_pM \;\longrightarrow\; \mathbb{R}, \qquad p \in M,\] varying smoothly in \(p\), such that each \(g_p\) is symmetric and positive-definite. In local coordinates the metric is completely determined by its values on basis tangent vectors: \[g_{ij}(p) \;:=\; g_p\!\left(\frac{\partial}{\partial x^i}\bigg|_p,\; \frac{\partial}{\partial x^j}\bigg|_p\right), \qquad g_{ij} = g_{ji},\] with the matrix \((g_{ij}(p))\) positive-definite at every point. The length of a tangent vector \(v = \sum_i v^i \frac{\partial}{\partial x^i}\in T_pM\) is then \(\|v\|_g = \sqrt{\sum_{i,j} g_{ij}(p)\, v^i v^j}\).

For the Poincaré half-space model in dimension 2, the metric evaluates on the coordinate tangent vectors \(\frac{\partial}{\partial x}, \frac{\partial}{\partial y} \in T_pM\) as \[g_p\!\left(\frac{\partial}{\partial x^i}\bigg|_p,\;\frac{\partial}{\partial x^j}\bigg|_p\right) = \frac{1}{y^2}\,\delta_{ij},\] i.e. the coordinate tangent vectors are orthogonal and each has length \(\frac{1}{y}\) — shrinking to zero as \(p\) approaches the boundary \(y\to 0\), which is what makes the space “infinitely large” near the boundary.

Often people write these metrics as \(ds^2 = \sum_{i,j} g_{ij}\,dx^i\,dx^j\), where each \(dx^i\) is a covector (1-form), i.e. an element of the dual space \(T_p^*M\). For finite dimensional vectorspaces there is a canonical isomorphism between them and their dual: given the coordinate basis \(\bigl\{\frac{\partial}{\partial x^1},\dots,\frac{\partial}{\partial x^n}\bigr\}\) of \(T_pM\), there is a unique dual basis \(\{dx^1,\dots,dx^n\}\) of \(T_p^*M\) defined by \[dx^i\!\left(\frac{\partial}{\partial x^j}\right) = \delta^i{}_j.\] This extends to isomorphisms \(T_pM \to T_p^*M\). Under this identification, the bilinear form \(g_p\) on \(T_pM \times T_pM\) is represented by the symmetric tensor \(\sum_{i,j} g_{ij}\,dx^i \otimes dx^j\) acting on pairs of tangent vectors via \[\left(\sum_{i,j} g_{ij}\,dx^i\otimes dx^j\right)\!\!\left(\frac{\partial}{\partial x^k},\frac{\partial}{\partial x^l}\right) = g_{kl},\] which recovers exactly the inner products \(g_p\!\left(\frac{\partial}{\partial x^k},\frac{\partial}{\partial x^l}\right)\) from before. So both descriptions carry identical information;

For this reason, we may also write for the Poincaré half-space model in dimension 2:

\[ds^2 = \frac{1}{y^2}\,(dx^2 + dy^2).\]

The key difference from Euclidean geometry is that a circle of radius \(r\) has circumference \(2\pi\sinh r > 2\pi r\), and volumes grow exponentially rather than polynomially: \[\operatorname{Vol}(B(r)) = \operatorname{Vol}(S^{n-1})\int_0^r \sinh^{n-1}(t)\,dt.\]

How does this relate to glitches/explots?

I got asked this question by Bartfeels24 on Reddit (see below). This is my reply, if you are still wondering about the why?:

In SM64, positions are stored as tuples of three floats. However, a programmer at nintendo thought that casting to short ints for collision detection would be fine. (a completely valid idea tbh, after all Mario was never intended to move out of bounds).

Casting floats to short ints however implicitly calculates a mod operation (in fact mod 65536 for each float in the tuple).

This little oversight can be exploited: Say mario wants to collect a star to finish a level. He needs to position himself somewhat close to the star, but only his position after modulo 65536 is used for collision detection. We can use another exploit to make Mario gain massive velocity and the physics engine will allow him to clip through walls with it. However, with great velocity comes the price of leaving the map (going out of bounds). Therefor, we use the module operation to still force a collision detection with the star.

This technique is more deeply rooted. Choosing wrong datatypes, or casting without care leaves you open to attacks. Whenever you cast some data structure to another one by “removing” information, such attacks can happen:

Real life example: Hacking your bank to get infinite money

Say, your bank would internally work with arbitrary floating point numbers for money, but they only charge you during transfers the rounded value up to a cent. If you transfer 0.009€ to another account, this account would get 0.009€, but you account would get charged 0.00€. Infinite money glitch irl.

But this really happens every time you perform some operations after “casting” (retracting) your data to a smaller data type for some processing

Further references:

I drew heavy inspiration from this video “Not Knot” by the Geometry Center / Geometry Supercomputer Project in 1991, directed by Charlie Gunn and Delle Maxwell.
A video about different geometries on \(\mathbb{R}^ 3\)