Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give overloaded functions a unique name #76

Merged
merged 3 commits into from
Jan 20, 2025

Conversation

disinvite
Copy link
Collaborator

@disinvite disinvite commented Jan 18, 2025

Update: Now enabled for console and HTML output from asmcmp.

For functions with the same name but different parameters (overloaded functions) we set a unique name that contains the argument list and use this in asm sanitize. For example:

  • list<MxDriver,allocator<MxDriver> >::iterator::operator++(int)
  • MxRect32::MxRect32(void)
  • MxPalette::MxPalette(struct tagRGBQUAD const *)

This is done by running the symbol from the PDB through the demangler function. If we don't have a symbol for some reason, we just number the function.

This uses a new entity attribute "computed_name" as to not interfere with "name". The ghidra scripts need the base name without arguments to set the function labels.

Limitations:

  • Does not distinguish const and non-const return types. This can be added later.
  • Thunks are not altered. We would need to maintain a reference back to the original function address. Once we have that, set all the thunk names as a final step (after the original functions have been updated) or generate the name dynamically when needed. This needs its own PR because it requires more database changes.

@jonschz
Copy link
Collaborator

jonschz commented Jan 18, 2025

While the change sounds nice, I am a little confused what the impact of this change is. Where does the user observe the new names? Or are they just added for potential future use? I haven't seen an obvious impact on asmcmp.

Also, do you have a good idea on how to test this change?

@foxtacles
Copy link
Member

foxtacles commented Jan 18, 2025

This will be very useful once it ends up in the ASM diff (once const is added too). I've been matching functions with calls to functions like operator[] of Vector where the constness matters and right now it's hard to distinguish in the diff

@disinvite
Copy link
Collaborator Author

disinvite commented Jan 18, 2025

I added some tests for the match_name() function that should have been there long before this. We can't unit test the new function in core.py easily because the "operations" from the compare core are not modularized (yet).

Right now, this just changes the sanitized assembly output. It's most useful in BETA10 where we have everything fully implemented but not annotated. You can see more clearly whether a particular constructor is a void or a copy constructor just from the listing, and then mark the address if you're working on that area.

This feature is probably more useful at the beginning of a decomp. It's more likely to expose a problem with annotations than a genuine diff where you called the wrong function. For example: in LEGO1, two addresses for _Construct functions are swapped.

If I add this as a new field to DiffReport and set these new names in the HTML output, you get this:

image

The repeated names are const versus non-const return types.

@disinvite
Copy link
Collaborator Author

The reason to not add this to the text output from asmcmp is that the isle CI just does a regular diff on the output. Changing a lot of the names will create a huge diff for the next pull request. We could change that to use the JSON output instead (--save and --diff args) which compares the match percentage using only the address.

@jonschz
Copy link
Collaborator

jonschz commented Jan 19, 2025

I'll briefly look into using JSON on the isle pipelines. Looks very promising overall!

@disinvite
Copy link
Collaborator Author

Thanks to @jonschz and isledecomp/isle#1352 we can now enable this for text and html output. Should we try to get the const return types added here or in a follow up?

@jonschz
Copy link
Collaborator

jonschz commented Jan 19, 2025

I'm fine with adding it in a follow-up. Will try to do the review today

Copy link
Collaborator

@jonschz jonschz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall!

)""",
(EntityType.FUNCTION,),
):
# TODO: Thunk's link to the original function is lost once the record is created.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set a unique name for an overloaded function, we should also update any thunks that point to it. The problem is that we don't set an attribute on the thunk entity that refers back to the original function, so there isn't any way to do this without re-reading all the thunks from the binary.

It is also the case that multiple thunks can point at the same function, so it would be wrong (and not very helpful) to have Thunk of 'Example::Function'(1) and Thunk of 'Example::Function'(2), if both point to the same thing.

We need to record the link between the thunk and actual function when we create the thunk entity. One complication with doing this is that we have no single global unique ID. Instead we have two: orig_addr and recomp_addr, so we need to have two "reference" attributes, one for each binary. I have some stashed code that will address this but it's part of a larger change that allows for two entities to be merged when we match their addresses.

@disinvite disinvite merged commit 9c2a6ba into isledecomp:master Jan 20, 2025
11 checks passed
@disinvite disinvite deleted the overloaded branch January 20, 2025 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants