[not ready for merge] Userspace stack tracing from kernel programs #466
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This has always been a bit of a far-off idea, but with the module API it's working, some of the time (definitely not all of the time). So I thought I'd share it to see if some of the tweaks necessary to make it happen would be reasonable.
Basically, there's a GDB script called
pstack
that attaches GDB and takes a stack trace of a program. You can also use/proc/$PID/stack
to get the kernel stack (assuming it's running in kernel mode or blocked). I was hoping to come up with a way to replicate that behavior in drgn, in a way that would work against/proc/kcore
or/proc/vmcore
. Essentially, it would allow you to get userspace stack traces from the crashed kernel (but not necessarily the whole core dump, likecontrib/gcore.py
) in the kdump kernel just before or after dumping the vmcore. (Presumably, userspace pages are filtered for 99.9% of all kdump configurations, so you'd need to run this while/proc/kcore
is still available).The main part of this requires creating a custom program that has a memory reader, as well as specifying all the required Modules, their biases, and their address ranges. From there, you can get the userspace
struct pt_regs
from the kernel program, copy it to the user program, and then unwind the stack.To do this, I've needed to tweak drgn a bit:
drgn_error
has the wrong error code. So I added a special-case to pass through fault-errors back to the drgn error. This could be made more general, but I don't actually think it would be good to do that generally..gnu_debugdata
which is helpful for my use case.At the end of the day, I'm quite confident that none of this is ready for merge, but I did wonder if any of the individual changes make sense to include?
For fun, here's the result of running the
contrib/pstack.py
script against the current bash process: