-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation question #4
Comments
Hi @gynt, I've never attempted to delink C++ code before. I've given it some thoughts and did some tests on my side, here's my brain dump on this. I tried to keep this accessible without going into too many details, which is why there's a TL;DR at the end if you just want a "what would it take?" answer. That being said, I'll preemptively answer these two questions first:
Short answer: it's transparent to the end-user of the extension. Long answer: the relocation synthesizer analyzer leverages symbols, references and data types present inside a Ghidra database to identify and undo relocation spots, whose work can be audited using
It doesn't. You can export any subset of a program at the symbol granularity as an object file, regardless of how the program was structured initially. It's just section bytes, symbols and relocations at this point. Put it in another way: as long as you don't cut across a variable of function, you could theoretically cut up a program into any arbitrary shape you want and this extension should generate a set of object files that, if all linked together, should produce a working program that is functionally identical to the original one. C++ and delinking in theoryMy whole delinking shtick relies on the fact that traditional linkers work on object files and are language agnostic. From their point of view, an object file built using a C++ compiler is no different than an object file built using a C compiler or the output produced by an assembler. Within that assumption, delinking C++ (or any language following the traditional compile-assemble-link toolchain flow) should be theoretically possible... but there are some ABI concerns nevertheless to keep in mind. Symbol name manglingC++ mangles symbol names. That's not a direct concern for delinking, but the symbol names do need to be correct for subsequent linking to work and my extension currently doesn't have any specific support for that. You could mangle the symbol names by hand inside Ghidra, but a saner option would be to add mangling options to the object file exporter. There might be some corner cases to account for (like handling C++-specific dataWhile traditional linkers are blind to C++'s considerations, compilers do emit C++-specific data like RTTI or unwind tables. If you try to delink code using these features, that data must be exported alongside the rest of the object file for it to work correctly. If annotated properly, that data should be theoretically delinkable like any other piece of data (hopefully). C++ and delinking in practiceThe test caseRather than just blindly guessing, let's put my extension to the test with a very basic C++ program, which should return 2 as its status code: #include <stdint.h>
class A
{
public:
int32_t number = 0;
int32_t touchB();
};
class B
{
public:
int32_t number = 1;
int32_t touchA();
};
A a;
B b;
int32_t A::touchB()
{
this->number = this->number + b.number;
return this->number;
}
int32_t B::touchA()
{
this->number = this->number + a.number;
return this->number;
}
int main()
{
a.touchB();
b.touchA();
return b.number;
} I've done this exercise with both
I've attached an archive with all those files for reference. The results
Overall there are things to fix (symbol name mangling, missed
TL;DRIt sorta works in the current state with a bunch of pitfalls. There are things I didn't test, but with some fixes/improvements I think delinking C++ code can be made to work well enough to be useable in practice. I did assume that you want to delink C++ programs back into object files for the same toolchain/platform, so no crazy Linux-to-Windows or PlayStation-to-Linux chimeras like I do. This side-steps a whole bunch of cross-platform ABI compatibility issues that are too scary to contemplate for C++. Also, looking at your GitHub profile I can probably bet that you want to delink Windows executables built using the MSVC toolchain. You won't be able to cheat with MinGW like I did once since these two toolchains are reportedly compatible only at the C interface level. Therefore, you'll probably need a COFF object file exporter in order to produce object files that MSVC can grok. I only have an ELF object file exporter at the moment, but my data model and analyzers should be generic enough for COFF. A prototype could probably be banged out in a week-end binge, but object file exporters are very finicky to get just right and a fairly exhaustive regresssion test suite is all but required to have any confidence in the results. Also, I only have code analyzers for i386 and 32-bit MIPS. CISC architectures are fairly easy to analyze so adding x86_64 support should be fairly easy. RISC architectures on the other hand... Let's just say I'm at my fifth attempt for MIPS and it's still wonky. Post-scriptumSorry for the huge wall of text. I've found that delinking is an esoteric topic that requires paying attention to a lot of very fine details in order to work. I've automated it down to a couple of clicks with my extension in practice, but unfortunately there are no such shortcuts available for theory. I should probably write a book at some point because there's hardly any resources about delinking out there, let alone an authoritative source I could cite for brevity's sake. At the very least, it might make for a very scary bedtime reading for linker developers. |
If you would write a book on this topic I would read it!
You are correct. I am trying to use this on a i386 windows PE binary from twenty years ago. I found objconv which can allegedly translate elf into coff, haven't tried it yet though.
The binary I want to use this on is 99% C++ member functions for C++ static singleton variables that are statically constructed before main() is run. So I kinda need the symbol name mangling, or I need to write my own inline assembly code to link to the object file, which isn't going to be pretty (but it basically is just a
I don't think I care about eh_frame in my use case of this ghidra extension
Makes sense because there weren't any virtual functions in your example. So no dynamic casting happens at runtime. My binary consists mostly of C style things (
Do you mean you didn't have any of that info in the original compiled program? I guess because a and b are not declared Links I found usefulOn ignoring eh frames: https://stackoverflow.com/questions/26300819/why-gcc-compiled-c-program-needs-eh-frame-section |
I should clarify that while I know enough about ELF and Linux to pull off this dark magic, this doesn't apply to COFF and Windows. So all my answers are implicitely prefixed by "hopefully COFF and MSVC don't do something completely different than ELF and gcc".
Old toolchains were a lot dumber than what we have today. It's unlikely the linker did something smart that causes a migraine... but it's possible it did something stupid instead. That being said, old artifacts are mostly good news for delinking. No section garbage collection and no link-time optimizations means programs tend to be fairly straightforward in their layout. You might even be able to make decent guesses where the original boundaries of the object files were.
Since Ghidra can have multiple labels for a given address (with one designated as the primary label), the simplest option for symbol name mangling would be to put an option to prefer a mangling scheme in the exporter. In the test case, for one of the methods Ghidra created both the primary label Here, Ghidra picked up the mangled names from the
If you don't care about RTTI, unwind tables, SEH and whatever else Windows does differently, you could just delink a C++ program as if it was a C program. It will probably work as long as the delinked code doesn't try to use these features. If it tries however, you'll have some very exotic undefined behavior on your hands. Just for reference, the relocations inside
Hopefully this means you mostly have "C with classes" instead of idiosyncratic C++. Probably good news for delinking.
This test program doesn't have any global constructors/destructors. It should be no different than any C++-generated data, so if I were to include the necessary bits of Overall I think your use-case is doable: my extension is missing a COFF object file exporter and some minor symbol name handling improvements, but my data model and my analyzers (the really tricky parts) should work out of the box. You might want to play a bit with the existing ELF support first and follow along the articles in my blog to get a feel for the workflow. |
So I've investigated this a bit on my side on Linux and I've identified a tricky source of problems for C++: section groups, known as COMDAT in Microsoft land. This covers stuff like vtables, typeinfos, implicit/default constructors/destructors, inline functions, implicit template instantiations... As far as I can tell, these bits can be delinked like any other code or data. They probably won't be a problem during object file exportation as long as they are external references: I think the definitions could come from another object file without any issues, but I haven't actually tested that part. However, if these bits are exported as part of an object file then it's another story. If these sections aren't handled specifically by the object file exporters, it will lead to multiple symbol definitions down the line during linking since these sections are supposed to be deduplicated. Hopefully most of it can be ignored with the external reference escape hatch mentioned above. Another thing to keep in mind is C++ ABI compatibility. It's not too much of a problem on Linux as far as I know, but it appears Microsoft doesn't provide any guarantees there across MSVC versions, at least before Visual Studio 2015. You'll probably need to use the same toolchain used to build the original program when reusing its exported bits elsewhere. In conclusion, I still think delinking C++ code is theoretically doable and my extension can probably handle it if it is suitably improved, but it's going to be trickier than just plain C code since the ABI surface is much larger. At any rate, the biggest blocker for your use-case is the COFF object file exporter. I might get around to do all of that eventually, but I can't make any promises or give any timeline: if you want it anytime soon, you'll probably have to get your hands dirty. |
Coming back to this after getting everything working myself and wanted to add some comments about global initializers, SEH, and resources in MSVC that seem relevant incase someone else comes across this. Simply including the list of global initializer function pointers in the delink selection isn't enough for MSVC to relink them properly, they need to be in a section with a specific name to be incorporated into the CRT by As described in CRT initialization, one can relink the global initalizers with Microsoft #pragma section(".CRT$XCU", read)
#define X(x) \
extern void x(void); \
__declspec(allocate(".CRT$XCU")) void (*__xc_u_0_##x)(void) = x;
X(FUN_008e4690)
/* etc... */ Regarding SEH, nothing special needs to be done, just make sure the pointer members of the record structs are marked as addresses in Ghidra and they are included in the delinker selection. Relevant section https://github.com/widberg/fmtk/wiki/Decompilation#42-tls-callbacks-structured-exception-handling-and-c-exceptions Finally, the easiest way to handle the resources is to extract them with Resource Hacker and relink them. You might run into the same thing I did with Windows Side-by-Side where you need to delete/replace the manifest resource. Relevant section https://github.com/widberg/fmtk/wiki/Decompilation#43-resources In general, the extension works great with C++ using the same toolchain as the original executable. I haven't gone too deep into replacing functions yet but keeping the symbol names consistent and cutting out the code I replace has been enough to keep me out of trouble so far. |
I am planning on using this!
I am wondering how you deal with the following scenario:
Two singleton C++ classes (A and B) reference each other's data inside functions.
In optimized compiled code, A and B may be placed next to each other in memory, and therefore the reference to b->number in machine code might look something like:
or like
How does this program delink this? I am guessing that it puts these functions in the same object file, especially in the latter example.
Because how would it know these came from different files? (and therefore different obj files).
Of course I can try this out for myself, but I figure I ask before I embark down this rabbit hole! It will set my expectations
The text was updated successfully, but these errors were encountered: