What is the format for debug info in Windows obj files? - windows

I'm messing around with compilers, .obj files, assembly, etc. The .obj file contains info that eventually ends up in the PDB, but I can't find any reference to the format that's used within the debug sections of the .obj file. (I have, however, found a reference to the COFF file format -- so I already know about that).
So: What's the format of the .debug$S and .debug$T sections when the source C file is compiled with the /Zi flag?

This information isn't published (the format used for native PDBs). If you can link the object file into an executable using "link" there are windows "debugging apis" you can use to "interrogate" symbols in an image. However, the format used for object files is not made publicly available.
You could try and reverse engineer it. If you find any info, please share it.

Related

PDB files do not include all .net source information

This may come down to my misunderstanding of PDB files and the build process, rather than any particular problem but I've struggled to find a good answer elsewhere.
We have recently been good little developers and started indexing and storing our pdb files on a central symbol server (all part of TFS). The problem is that our PDB files do not appear to include all the source information.
When trying to navigate to sources in Visual Studio, the pdb files of our assemblies are found, as shown by the output window:
PdbNavigator: Downloader: file://server/Symbols/my.assembly.pdb/1DB3F79EA3094EAAADFC6CDE6515FC871/my.assembly.pdb -> ok, 251 KB
PdbNavigator: No debugging information found on symbol servers for my.assembly, Version=1.0.1.1206, Culture=neutral, PublicKeyToken=4cd79aeab39b919b
But at the same time it says it found no sources. If I use some of the tools from the windows SDK I can see that the PDB file does not contain the information on about 30% of the source files in the project.
I think I read somewhere that PDB files only include the source for classes actually used within the project, but surely that creates a massive problem for any API type assemblies where multiple classes may have no function within the assembly, only when used from some other part of your project?
If anyone can shed light on this, please let me know.
Thanks.
A PDB (normally) doesn't store source code - it contains a list of "documents", which are the source code file names, and "method information", which maps source lines to offsets in the assembly or binary. A PDB matches when the signature and build date of the assembly matches the same in the PDB file. Chances are, the MyAssembly.pdb has the correct version, but the signature and/or build date don't match.
The signature is not exposed as far as I know, but you may find some code on the Internet that says how to read a PE signature and a PDB signature so you can do a comparison.

Viewing Codeview symbols

I have an old game executable with a large section of debug symbols, apparently in the Codeview format. How can I view the contents of this section in a human-readable format?
Current Windows compilers do not put the content of the debug symbols into the image file itself, they only put a reference to an external symbols file into the image. They put the debug data into a separate symbols file with the PDB (Program Data Base) extension. As you mentioned it, this format is also named CodeView. In your case, it looks like (since the debug section is large) you might be confronted with a really old symbols format.
this article explains the different symbols formats.
Okay, given that this is for 32-bit Windows, I believe the normal Windows symbol handler API should be able to read the data. From there, it's pretty much a matter of deciding what data you want, and how you want it formatted.

Convert DBG into PDB?

i have a Win32 compiler which, for years, has been able to create a DBG debug information file.
This has allowed debuggers, and tools like Process Explorer and Process Monitor to have access to symbol information:
i recently learned that Visual Studio's debugger no longer accepts DBG files, only undocumented Program Database (PDB) files:
Since Microsoft keeps the PDB format secret, i assume they have a tool that will allow me to convert existing debugging information to a PDB (so i don't learn the secrets of their file format).
Bonus Reading
cv2pdb: how to use to convert other debug formats to pdb?
Undocumented
Even though Microsoft has a GitHub repository for PDB, the spec remains completely undocumented. The files on their repository are incomplete. There are missing types and declarations.
And even though i've created a PDBViewer:
It doesn't get me anything - because Microsoft doesn't explain what any of it means.
The point isn't just to look at a PDB - we need to create one. And for that we need to know:
what goes in it
where
and what format
PDB is not documented, but you can collect very detailed information about the content of PDB files programmatically using the appropriate interfaces See Sample
The PDB format is now documented-through-code by Microsoft in a GitHub repository. LLVM also have a great overview, partly based on Microsoft's documentation.
That's not a complete answer because you'll still need to write the tool to do the conversion...
LLVM developers documented the PDB file format in order to make clang and lld able to read and produce PDB files. Microsoft's PDB Github repository was put up, in part, to support that work.
PDB is primarily a container for CodeView debug info, which is documented by Microsoft.
LLVM provides libraries for working with PDBs and COFF debug info, as well as command line tools for inspecting and generating them from YAML.

What is the structure of a PDB file?

I am trying to understand how a debugger uses PDB file. It would probably be a small file system in itself. Could someone help me understand the structure of the PDB file?
According to this blog post, the actual file format is kept secret by MS. However, I recommend you read that post as it has a lot of useful information what a PDB file is and how it's used.
According to MSDN, its impractical:
Because the format of the .pdb file generated by the postcompiler
tools undergoes constant revision, exposing the format is impractical
A debugger would use the DIA SDK to access the data inside them, meaning you don't really need to know its structure.
As a matter of fact, the format of PDB is not documented, but you can collect very detailed information about the content of PDB files programmatically using the appropriate interfaces See Sample

Embed .pdb debug symbol information into an .exe file in Visual Studio

I am experimenting an analysis tool that can analyze executable files with embedded debug symbol information in Windows. While trying this tool on several open source projects, I realize that most of the builds do not keep symbolic information in executable files. I am able to compile the source code with VS (2008), but the build normally keeps the debug information in a separated .pdb file, not in the .exe file (unfortunately I only want to read debug information from .exe file and not .pdb file :-().
Does anybody know a way to embed symbol debug information into a single .exe file using Visual Studio?
I know this is a pretty old issue but this feature has recently been merged into Roslyn: https://github.com/dotnet/roslyn/issues/12390
The MSDN says that it isn't possible.
It is not possible to create an .exe or .dll that contains debug information. Debug information is always placed in a .pdb file.
i don't know, yet, how to do it - but there's article on MSDN that talks about it.
A portable executable (i.e .exe or .dll) can have a flag present in the header: (archive)
IMAGE_FILE_DEBUG_STRIPPED
Debugging information was removed and stored separately in stored separately in a .dbg file.
This implies that debugging information can be in the executable, and has the option of being removed and stored in a separate .dbg file.
From MSDN article DBG Files: (archive)
DBG files are portable executable (PE) format files that contain debug information in Codeview format for the Visual Studio debugger (and possibly other formats, depending on how the DBG was created). When you do not have source for certain code, such as libraries or Windows APIs, DBG files permit debugging. DBG files also permit you to do OLE RPC debugging.
DBG files have been superseded by PDB files, which are now more commonly used for debugging.
You can use the REBASE.EXE utility to strip debug information from a PE-format executable and store it in a DBG file. The file characteristic field IMAGE_FILE_DEBUG_STRIPPED in the PE file header tells the debugger that Codeview information has been stripped to a separate DBG file.
A knowledge base article describing the COFF format mentions the dumpbin utility, and it's /SYMBOLS option:
/SYMBOLS Setting this option causes DUMPBIN to display the COFF symbol
table. Symbol tables exist in all object files. A COFF symbol
table appears in an image file only if it is linked with
/DEBUG /DEBUGTYPE:COFF
The next step, and the part that would answer our question is:
what format is the embedded debugging information?
where in the PE is the embedded debugging information stored? (resource?, data section?)
But the answer "it cannot be done" seems to be incorrect.
See also
IMAGE_FILE_HEADER structure (archive)
LOADED_IMAGE structure (archive)
DBG Files (archive)
Microsoft PE and COFF Specification (archive)
An In-Depth Look into the Win32 Portable Executable File Format (archive)
KB121460 - Common Object File Format (COFF) (archive)
There is no built-in support in Visual Studio for this type of operation (at least for managed languages). The .PDB and .EXE files are created at the same time and have no option for embedding. I'm not even sure the .EXE format supports embedding PDB symbols although I could be wrong on this point.
The only course I can see is embedding the PDB as a resource in th e .EXE. However that would have to be a post build step since the two are built at the same time. And there is the potential for invalidating parts of the PDB if you modify the EXE after it's been built.
Is there a particular reason you're trying to do this? I'm imagining it's going to end up causing you a lot of pain as 1) it's not supported AFAIK and 2) the tool chain is geared towards looking for PDB in the same directory not within the .EXE. Deploying 2 files is a bit annoying at first but it's how its done at this point.
I'm pretty sure PDBs were always stand-alone files. VC++ used to have a switch that would cause it to emit (limited compared to PDB) symbol information to a "CodeView" .DBG file that by default was embedded in the EXE. However, that switch appears to no longer be supported in the newer (post 6.x ?) versions of the compiler.

Resources