Data-soup with long noodles | Sebastian Schöner

Hey! You can find me on Mastodon and Bluesky!

This is a post about Unity’s ECS (as of Unity 6) and how it handles dependencies between jobs.

The other day a friend asked: “Why are the different methods of scheduling work in Unity’s ECS so inconsistent? Some of them implicitly wait for the jobs we know that they depend on, some of them don’t. It’s frustrating and confusing!” I want to answer that and provide some perspective and context.

Let’s start at the basics: Unity’s ECS integrates with Unity’s job system, and there are different notions of dependencies at play here that I’d like to untangle first.

The job system

On the lowest layer, there is Unity’s job system: The job system allows you to either “schedule” or “run” jobs. When you schedule a job, you get a handle for that job which you can use to refer to that job, wait for that job, or check whether it has finished. When you “schedule” a job, you can pass one or more job handles as dependencies of that job, which communicates to the job system “do not run this job until those dependencies have finished.” When you “run” a job, you ask the job system to take the job you have given it and immediately execute it on the calling thread. You may not pass dependencies into that, and you have to manually ensure that running the job is fine.

Why would you ever want to run a job instead of scheduling it? Isn’t scheduling work off the main thread the entire point of the job system? There are multiple answers here: First, historically jobs were the only place where you could use the Burst compiler (that changed over time), so it was a way of using Burst for best codegen. Second, you may have written some code that was once a job but it turns out that you would rather run it on the main thread, and having a method for that minimizes changes. Third, jobs in Unity are a neat mechanism to be explicit about what data you are reading and writing, and you can benefit from being this explicit even on the main thread.

The jobs debugger

This brings me to the next piece of tech up the stack that deals with dependencies: the jobs debugger. Unity has a bunch of different safety mechanisms in DOTS. It’s easy to mix them up, so let’s talk about them briefly. In C#, there are checks that are guarded by ENABLE_UNITY_COLLECTION_CHECKS (e.g. index range-checks in NativeArray), and those only ever execute in the editor. For all intents and purposes, ENABLE_UNITY_COLLECTION_CHECKS is equivalent to UNITY_EDITOR. Then there are checks guarded by UNITY_DOTS_DEBUG, which are all sorts of runtime checks that you can enable in a build. UNITY_DOTS_DEBUG might cover index range-checks in say NativeList. Frustratingly, UNITY_DOTS_DEBUG can’t cover anything in NativeArray, because NativeArray ships with the pre-compiled Unity C# assemblies, which are unaffected by defines set in your project. One thing that ENABLE_UNITY_COLLECTION_CHECKS covers are the so-called AtomicSafetyHandle members of various data structures. These hence only exist in the editor.

The jobs debugger exists only in the editor, and it uses these AtomicSafetyHandle members to track what things a job you schedule touches. When you schedule a job from C#, the jobs debugger uses a form of reflection to look at the struct that represents the job. Then it looks at all of the different containers you are using, checks whether you have annotated the fields that contain them with ReadOnly, and then uses that information to register the job you are scheduling into a graph internally. This works because the containers contain an AtomicSafetyHandle, which the jobs debugger uses to identify that specific container instance. There are a lot of details and nuances to this, but with a lot of work the jobs debugger can now raise an error if you schedule (or run!) a job but forgot to specify or manually waited for a dependency: for example, your job reads from a container that another job writes to, but you did not specify that job (or one of its dependents) as a dependency.

The jobs debugger goes to great lengths to give you useful, consistent errors. What this means, in short, is this:

The jobs debugger can tell what dependencies your job should have by inspecting AtomicSafetyHandle fields nested in your struct.
The jobs debugger (and AtomicSafetyHandle) does not exist at runtime in a build of your game. It cannot use this information to make scheduling decisions for you.
The jobs debugger is conservative and raises errors that could happen (vs. those that do happen). You still need to be careful if your game build works differently from code in the editor.
The jobs debugger is completely independent of ECS.

I am big fan of the jobs debugger. It has some short-comings, but

The ECS itself

Unity’s ECS also tracks dependencies between jobs, but it does that in both the editor and in builds. It may also raise safety errors. It is however separate from the jobs debugger. For each component type in your game, the ECS tracks which jobs are reading and writing to that component type by storing relevant job handles. We will get to how it knows that in a minute. These handles can be used to pass dependencies to systems, or wait for all jobs that touch one component, or wait for all jobs that interact with the ECS. The latter is important because structural changes to entities (create/destroy, add/remove components) can’t happen in parallel to jobs running on ECS data. Similarly, if some main thread code anywhere decides to use the EntityManager to look up some component value, then that should be caught and either raise an error (“hey, there are jobs running!”) or implicitly wait for those jobs before getting that data.

Systems in ECS

Systems are the basic unit of code organization in ECS, and that is where Unity’s implementation tracks who is reading or writing what components. In earlier versions of Unity’s ECS, there was the JobComponentSystem whose update signature looked something like this:

JobHandle Update(JobHandle dependencies);

This means that the outer code of the ECS would figure out what dependencies the jobs scheduled by this system have, pass those dependencies into the Update function, and then have the Update function return a job handle that represents all the jobs scheduled by that system (so even if your system schedules 15 independent jobs, there is only one job handle at the end: a single “virtual” job that depends on your 15 actual jobs). The outer code of the ECS can then query what component types your system reads and writes and use that to update the per-component-type job handles using the job handle your Update returns. Modern Unity ECS code does not have JobComponentSystem, it uses ISystem instead, but the Dependency property on SystemState works the same way: when you schedule an ECS-aware-job (e.g. IJobEntity) in a system, it will implicitly (through codegen) take the Dependency property as a dependency and write the result of scheduling the job back into that Dependency property. If you schedule a non-ECS-aware job (like IJob) in a system, it will not implicitly do that. This inconsistency is a consequence of trying to “make IJobEntity always do the right thing.” It’s hidden additional complexity, and quite frankly terrible because it is not happening consistently between different job types.

How does the ECS know what component types a system is reading? Excellent question. Whenever you acquire an EntityQuery, a ComponentTypeHandle, or a ComponentLookup through SystemState the system tracks that and registers it. (There is some behind-the-scenes magic using AtomicSafetyHandle instances to ensure that the jobs debugger sees the same dependencies.) Using IJobEntity or similar abstractions does the same under the hood.

So let’s recap:

The ECS tracks which jobs read from or write to a component type by looking at what data the system could be using and then looking at the jobs scheduled by that system.
When you schedule two jobs in a system, the ECS cannot distinguish between those two jobs. They will get the same dependencies, and if you are not careful also be dependent on each-other (through the Dependency property).

Is this good?

Let’s consider a quick example. You have component types A, B, C, D. You have a system that schedules two jobs: Job 1 reads component A and job 2 reads B and writes C. From the ECS perspective, we just know that your system reads A, B and writes C. So before your system updates, the Dependency property is set to the combined job handles of “writers of A”, “writers of B”, and “readers and writers of C”. Job 1 now also depends on “writers of B” and “readers and writers of C”. Because you used implicit dependency management, job 2 now additionally also depends on job 1. After the update, the Dependency property is a job handle representing job 2, and we record that as a “reader of A”, “reader of B”, and “writer of C”. Now it turns out that unrelated component type D is read in a job that writes component type A. That job gets the “readers+writers of A” and “readers of D” as dependencies, which also happens to include the jobs that write to the unrelated component C. Maybe there is only a single entity with A and D but ten thousand with A and C. Should they be related now?

At scale, it is almost entirely impossible to avoid that everything depends on everything else, at least a little bit. Fundamentally, I believe that this approach to handling dependencies turned out to be wrong. It’s impossible to reason through these dependencies, and even with proper tooling (which Unity lacks), I would predict that the tool would just tell you “you’re screwed” in the form of a large, entangled graph. The ideas of “recombine components to drive behavior” and “handle dependencies on a component level” are incompatible. There are tons of other issues here as well: chunk fragmentation, archetype explosion, “help, adding a physics component to an entity affects rendering performance” and so on.

When I think of Unity’s ECS nowadays, the picture that comes to mind is that of soup: soup with long noodles that make it hard to eat without spilling. “Soup” is incidentally also the online-nickname of a friend who would likely agree on this. (Hi, Adam!) This is not to say that Unity’s ECS is not an improvement over game objects (I think it is!) but in terms of whether it is a good answer to multi-threading, the answer is a clear “no.”

But what is a good answer? Clearly, the “right” answer is that you should put all things into big arrays with handles to things and be super explicit about any sort of multi-threading. That’s a good answer if you can afford a team of experts who know exactly what they are doing. For a game engine like Unity, that is probably not the right answer. ECS does solve a bunch of problems: having a data model makes it simple to write generic tooling, and large parts of the programmer population want to think in terms of “things” instead of “bytes.” It might not be technically optimal, but it needs to serve other, non-technical requirements as well. A clear step forward in my book would be to stop archetype explosions and track jobs per archetype. The goal would be to isolate areas of the game as much as possible (“rockets never interact with toddlers”) and multi-thread that way. Oh, and have a very good answer to single-threaded code. Ideally, you want to move entire organizations worth of programmers onto different threads where they can pretend that they are single threaded. Until then my suggestion is: isolate your data, identify which parts of your game do not need to run in ECS and move them out, completely, and then multi-thread that.

And please let me know if in your game, toddler DO interact with rockets (peacefully). I’d love to see that game.