What is the structure of a PDB file? - visual-studio

I am trying to understand how a debugger uses PDB file. It would probably be a small file system in itself. Could someone help me understand the structure of the PDB file?

According to this blog post, the actual file format is kept secret by MS. However, I recommend you read that post as it has a lot of useful information what a PDB file is and how it's used.

According to MSDN, its impractical:
Because the format of the .pdb file generated by the postcompiler
tools undergoes constant revision, exposing the format is impractical
A debugger would use the DIA SDK to access the data inside them, meaning you don't really need to know its structure.

As a matter of fact, the format of PDB is not documented, but you can collect very detailed information about the content of PDB files programmatically using the appropriate interfaces See Sample

Related

Which tool to use to open .pdb (symbol) files?

I have .pdb file, downloaded from MS symbols server. I need to fetch list of symbols (functions, arguments, anything it has). There is a tool on CodeProject, but it only reports modules. There is DbgHelp API, but it only could be attcahed to running process. How can I read .pdb file offline?
Good News for anyone still looking,
The information you seek is now open source!
https://github.com/Microsoft/microsoft-pdb
Some real interesting stuff there. Like this pdbdump.cpp file,
with its dumpPublics function or its main flow controls. Good documentation too
You can also use Visual Studio's Dia2Dump sample program to dump human-readable output from a PDB file, including its public symbols.
Be sure to build it as a 32-bit application though, or you might run into some problems with it. (See dia2dump: CoCreateInstance failed - HRESULT = 80040154)

Debugging with pdb files not generated at the same time as the Binary. Is it possible?

I tend to think it is not possible.
I know that the generated binaries and its counter-part pdb files are tied at compile-time.
Today I have to try to help debugging dumps of a really old version of the product, about which obviously the pbd has been generated at another date.
Visual Studio faithfully tells me that the pbd do not correspond to my binary. (And I always thank it gratefully for that for all the hours it saved me when I was not on the good binary).
However, this time the problem is not the same: I do want to use the compilation-time-unrelated pdb files.
I think I remember that I read sometime that it is however possible to exploit such pdb, if the source code was the same: I think it involves parsing the pdb, or correcting some timestamp value data inside the pdb itself or something like that... whatever. I may really be wrong on that.
So, even if it's hard, is there a way to use such pdb files that were not generated at the same time as the binary, but with the same exact code ?
.symopt+ 0x40 (load mismatched pdb )
Simply rebuilding the PDB from the same source may not produce exact results. Consider that the environment was updated meanwhile. You probably use a newer version of the compiler etc.
That said, there is
.symopt+ 0x40
for WinDbg as mentioned by #blabb already.
If you prefer debugging in Visual Studio, there is this tool that modifies the timestamp: Chkmatch.
chkmatch -m <exe> <pdb>
which copies the information from the executable to the PDB. This is a bit dangerous: if you forget that this is a modified PDB, you may later hunt bugs that are not present. I recommend
keeping the original, mismatched PDB
creating a batch file for copying the PDB and modifying it
deleting the modified PDB every day
Calling the batch file will remind you that you might get inexact results during your debug session.

PDB files do not include all .net source information

This may come down to my misunderstanding of PDB files and the build process, rather than any particular problem but I've struggled to find a good answer elsewhere.
We have recently been good little developers and started indexing and storing our pdb files on a central symbol server (all part of TFS). The problem is that our PDB files do not appear to include all the source information.
When trying to navigate to sources in Visual Studio, the pdb files of our assemblies are found, as shown by the output window:
PdbNavigator: Downloader: file://server/Symbols/my.assembly.pdb/1DB3F79EA3094EAAADFC6CDE6515FC871/my.assembly.pdb -> ok, 251 KB
PdbNavigator: No debugging information found on symbol servers for my.assembly, Version=1.0.1.1206, Culture=neutral, PublicKeyToken=4cd79aeab39b919b
But at the same time it says it found no sources. If I use some of the tools from the windows SDK I can see that the PDB file does not contain the information on about 30% of the source files in the project.
I think I read somewhere that PDB files only include the source for classes actually used within the project, but surely that creates a massive problem for any API type assemblies where multiple classes may have no function within the assembly, only when used from some other part of your project?
If anyone can shed light on this, please let me know.
Thanks.
A PDB (normally) doesn't store source code - it contains a list of "documents", which are the source code file names, and "method information", which maps source lines to offsets in the assembly or binary. A PDB matches when the signature and build date of the assembly matches the same in the PDB file. Chances are, the MyAssembly.pdb has the correct version, but the signature and/or build date don't match.
The signature is not exposed as far as I know, but you may find some code on the Internet that says how to read a PE signature and a PDB signature so you can do a comparison.

Why should we need the map file when pdb file is available in windows platform?

As described in title, I think the pdb file is a superset of map file. The reason why I ask this question is due to the fact that i'm now taking charge of sustaining a old system which will produce pdb and map file at the same time. I wonder if the map file is unnecessary while pdb file is available!
Thanks
I've also wondered about this and decided to see what John Robbins has to say in his book "Debugging Applications". He says that map files are "the only textual representation of your program's global symbols and source and line number information" and can be read without any supporting program. He goes on to say that Microsoft changes the symbol table format on a regular basis and if you have a customer running a very old version of your program, it may be hard to find an old version of the symbol engine that can interpret the symbol table in the PDB files for that very old program. But since a map file is just a text file, you would easily be able to map a crash address to a symbol by simply opening the map file in notepad!

Viewing Codeview symbols

I have an old game executable with a large section of debug symbols, apparently in the Codeview format. How can I view the contents of this section in a human-readable format?
Current Windows compilers do not put the content of the debug symbols into the image file itself, they only put a reference to an external symbols file into the image. They put the debug data into a separate symbols file with the PDB (Program Data Base) extension. As you mentioned it, this format is also named CodeView. In your case, it looks like (since the debug section is large) you might be confronted with a really old symbols format.
this article explains the different symbols formats.
Okay, given that this is for 32-bit Windows, I believe the normal Windows symbol handler API should be able to read the data. From there, it's pretty much a matter of deciding what data you want, and how you want it formatted.

Resources