I recently had the pleasure to encounter problems related to PDBs (the sidecar file your compiler emits) becoming to big. This post shall document parts of the process, in the hopes that it is useful for someone.
Imagine, if you will, a large C++ codebase that is compiled using Microsoft’s compiler, MSVC. If your imagination is sufficiently sophisticated, you may have just imagined a codebase that on a thread-ripper requires you to bump your pagefile beyond 250GB so you don’t run out of memory during parallel compilation. Add just a little bit more to your imaginary codebase and you might find that MSVC’s linker may give up:
1>LINK : fatal error LNK1201: error writing to program database ‘PATH’; check for insufficient disk space, invalid path, or insufficient privilege
It looks like one such unlucky codebase in the past has been Tensorflow, for example.
So, when does that error happen? If you look at the MS docs, you’ll find that PDBs “can be up to 2GB”. If you try this for yourself, you’ll notice that the actual limit seems to sit at about 4GB, not 2GB. Either of 2GB or 4GB are sensible values: there probably is a place where someone is using either an
uint32_t to index into something, and that then sets your limit.
But it turns out that this 4GB limit isn’t a hard limit. PDB files follow a stream-based format, and the streams are separated into pages. It’s actually a bit more complicated, but the 4GB limit relates to these pages, and others (LLVM) have already figured out that you can tweak the page size to allow for larger PDB files. You’ll see that you can go quite a bit beyond the 4GB limit using 2^20 pages. The LLVM-link above states:
Page Size | Max File Size
<= 4096 | 4GB
8192 | 8GB
16384 | 16GB
32768 | 32GB
Surely, 32GB should be enough for anyone. You can find some more discussion here – the TL;DR is that you should use
/pdbpagesize:8192 or above. Job done! … Unless of course that’s not the limit that you are hitting, probably because you already passed that limit a long time ago and are now failing to produce PDBs again. What do you do then? My googling at least did not reveal other resources on this problem.
There is some advice from Microsoft on how to reduce PDB sizes here, or you can split up your program more granularly (e.g. multiple DLLs instead of a monolithic build), or you can just compile fewer things, or selectively disable debug information. One option that you may reach for is using
/DEBUG:FASTLINK (docs) but I would strongly suggest you do not. My experience with fastlink is that debugging with that option is essentially impossible, so you may as well just drop the symbols entirely. In my case, the VS debugger crashes frequently and any operation take ~10min and more. Profiling the debugger suggests that it is binary searching through gigabytes of data repeatedly, constantly loading new things into memory only to drop them shortly afterwards – which vaguely makes sense because debug data is now littered across object files and at some point you need to pay for locating it.
It was not just curiosity but also the sheer necessity of those debug symbols that lead me to debug
link.exe to see what exactly goes wrong:
- My first guess was that the error indicates a failing write-operation. You can inspect all those I/O calls using Process Monitor, but in this case it all looked fine. (If you find a failure this way, Process Monitor can even give you a full callstack.)
- My next approach was to load
link.exeinto a disassembler to see whether I can find the place that references the error message and understand that well enough to form a better theory of why it is failing. That was not fruitful either, mostly because some interesting choices related to which PE segment the error messages live in made that process more difficult than necessary. I think that approach could have worked with a little bit more persistence.
- Instead, I decided to profile the linking process with Superluminal. The idea here is that a profiler gives you a high-level overview of what is happening, then you can zoom in on the last bit that the process is doing – with high likelihood, the last thing that the linker is doing is what is causing the problem (because it aborts, seemingly), and with binaries of this size all steps probably take long enough to show up in the profiler.
- This was a very fruitfal idea and I learned two things: Firstly,
link.exeand most of the things it calls actually have symbols on Microsoft’s symbol server (I should not have been surprised, Microsoft is very good about this!). Second, the linker successfully generated the PDB but “merely” failed to write it out. There is a main thread that enqueues commands to a pool of workers, and when a worker executes the final “Commit” command, it fails and propagates the error to the main thread, which then prints it out.
- I used WinDbg to place some breakpoints around the failing commit code on the worker thread. It happens in
mspdbcore.dll, while the TPI stream (this contains all the type info, more below) is committed. This looked like a long debugging adventure at first but Stefan Reinalter kindly reminded me that Microsoft released some PDB related sourcecode years ago, and I was able to match up some of the symbols. The code did not fully match the assembly I was seeing, but it was close enough to significantly speed-up the process of understanding the disassembly. (Stefan is the author of RawPDB and we collaborated a few months earlier to fix a PDB-parsing crash in Superluminal caused by large PDBs. This gave me a good understanding of PDBs that came in helpful here. Additionally, I think I may have solved the age-old question of “how to make friends as an adult”: debug an integer overflow bug together.)
- To my surprise, none of the “big” file I/O failed.
TPI1::Commit(here) is the relevant code and it starts off by writing out all the big stuff first. Yet somehow that is not the problem at all. No, to my very surprise it actually fails all the way towards the end of the function because it cannot write the 50 (!) bytes for the stream header at the start of the stream.
The functions involved in writing here actually explicitly state that they can only be used to override existing parts of the stream, so we are not even adding more memory at this step. What happens is this:
- We try to write 50 bytes at offset 0. That is happening here.
- We validate the stream number (SN) for the TPI stream.
- Then we find the end of the stream and perform a bounds-check for our write
off + cbBuf <= GetCbStream(sn).
- Then we perform more bounds-checks in the
readWriteStreamfunction (here) before actually writing the data.
The last two points fail, and that makes perfect sense once you consider that the type involved here,
CB, is a signed integer type instead of an unsigned type, and the size of the TPI stream sat just above 2GB. That is the problem here. Bumping up the pagesize will not help, the total size of the PDB is irrelevant, you are still limited to 2GB streams.
I was able to generate a “valid” PDB by patching some of these compares in
mspdbcore.dll to be unsigned instead, i.e. replacing some
ja. The resulting PDB cannot be read by Microsoft tools because the same sort of integer overflow will happen during a read, but other tools like the excellent RawPDB library can, which is enough to analyze the PDB. There are of course good reasons for not just patching the binary and calling it a day, who knows in what other places this is overflowing. But the fact that other tools do accept streams above 2GB and that the number of paths that actually need the raw offset into the file are probably very limited make me hopeful that this can be resolved in source by Microsoft without touching the format at all to at least go to 4GB. The 4GB limit is probably harder to fix since the TPI stream header has just 32bits for the size in bytes.
To work around the issue, I ended up disabling debug information for some parts of the program. That is rather anti-climatic, because that is just what you could do when you need to reduce total PDB size, but it was a sensible step. This freed up 400MB in the TPI stream, which should be enough to work around the issue until a more permanent solution is found.
…so imagine my surprised face when the same issue reappeared less than 4 weeks later! It turns out you can generate a lot of type-data very quickly with just a few lines of code in the right place. Surprisingly, this is not related to using types for meta-programming at all – it seems a type only shows up in the TPI stream when an actual value of that type appears in the program. Instead, consider what happens when you put a local struct or a lambda (or both!) into a widely-used macro like an assertion-macro: this easily generates a hundred thousand types. It is worth pointing out that the change in this case was a revert – someone else had learned that same lesson months before me and a technicality mandated reverting the changes before fixing them up again.
Finally, a few words about the TPI stream. The TPI stream is where all of the type information lives. It contains types, their members, and all their metadata. It’s essentially a long sequence of records that describe different aspects of type. What I have learned about this in the process:
- Records don’t just describe types but also their members, and a whole host of different things. A type probably consists of many different records, one for the type itself, one for the list of fields, many for the different members, which will refer to other records for their types etc.
- Records might also just be modifiers for other things, such as “const pointer to this other type.” For a big program, it’s hence not unreasonable to have millions of such records.
- Records contain the raw names for types, members, and methods, which is the main reason for huge size. Probably about half of the stream is just strings. There is no compression.
- Long typenames (unsurprisingly) come from heavily templated code, but not all template usage ends up putting something there. I have not seen
std::conjunctionin there, for example, or anything that does not have a value of that type in the program.
- Nested types in particular can contribute more than you think they do. Every nested type adds to the size of the outer type because the outer type actuall stores the full name of the inner type. I have seen single types contributing more than 60MB to the TPI stream because of this.
- You can check out the size of your PDBs’ TPI streams by running the first few lines of the PDB size example of the RawPDB library.
Here are some guidelines for commonly used code that can help when your TPI stream is getting too big (and you can’t change how you link or disable debug symbols):
- Do not put lambdas or local types into macros.
- Pull out nested types whereever possible. Nested types in templated types are an easy way to generate many types. Similarly, putting a public template type into another type can quickly lead to problems because you are now paying for an unbounded number of nested types.
- Push member functions up into base-classes whereever possible, especially when you have a pattern where the base is concrete and the child is templatized. Consider adding such a base for members that do not require the template parameteres.
The techniques here are very similar to reducing code-bloat. I have quite a few ideas for tooling around this, but it is hopefully not necessary to put more time into it (beyond the little tooling I put into RawPDB as an example).