Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reallocting location array only when bytecode size is greater than the location size. #84

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

nmdis1999
Copy link
Contributor

Previously location array was reallocated for every instruction, which was ineffective and caused a lot of performance overhead for programs which did not need reallocation (such as those which had fewer case when bytecode size was larger than the current location array size). This fix fetches back the performance so lost.

src/lyk.c Outdated
// YKOPT: Reallocating for every instruction is inefficient.
YkLocation **new_locations = calloc(f->sizecode, sizeof(YkLocation *));
lua_assert(new_locations != NULL && "Expected yklocs to be defined!");
if (f->yklocs == NULL || f->sizecode > f->yklocs_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be something like:

if (f->yklocs == NULL) {
  f->ykcallocs = calloc(...);
  f->yklocs_size = ...;
  ...
}

if (f->sizecode > f->yklocs_size) {
  realloc(...)
}

A related question: are sizecode and yklocs_size counting number of elements or number of bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my first implementation used realloc, but I hadn't thought of this. Is realloc always faster than calloc?
sizecode keeps track of number of instructions in the bytecode array, I am guessing that yklocs_size contains number of locations. So to answer your question, so both are counting number of elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use calloc you allocate two chunks of memory, and have to free one. realloc does that and it moves the old data you had.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed here: 60dca65

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing that yklocs_size contains number of locations

Can you check and make sure of this please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is number of locations. Here is the commit fixing the comments: 1dee9a9

@nmdis1999
Copy link
Contributor Author

@ltratt please lmk if we need to change anything else in this PR.

@ltratt
Copy link
Contributor

ltratt commented Mar 27, 2024

Please squash.

@nmdis1999
Copy link
Contributor Author

@ltratt done.

@ltratt
Copy link
Contributor

ltratt commented Mar 28, 2024

@nmdis1999 Please force push an update with an updated commit message. The title needs to be capitalised and of the form "Optimise the YkLocation buffer" and the body of the commit needs to explain why this was done: what was the problem? why this solution? are there some (summary) benchmark numbers?

@nmdis1999
Copy link
Contributor Author

@ltratt done

@ltratt
Copy link
Contributor

ltratt commented Mar 28, 2024

Can you force push an update without the (slightly dodgy, as it's not done properly on the RHS) |-- formatting please? Indeed, you can simplify this and just say something like "this improves verybig.lua by XYZx and doesn't make any other benchmark worse" if you'd like.

Location array was reallocated for every instructions, which was very ineffective
since we only need to reallocate the buffer either when yk location is set to NULL
or when the number of bytecode instructions is greater than number of location
in the yk location array. This was causing around 1000x slowdown in benchmark
test such as verybig.lua. The fix improves verybig.lua by 300x and
brings performance from 198sec to 0.650sec when JIT is off.
@nmdis1999
Copy link
Contributor Author

Can you force push an update without the (slightly dodgy, as it's not done properly on the RHS) |-- formatting please? Indeed, you can simplify this and just say something like "this improves verybig.lua by XYZx and doesn't make any other benchmark worse" if you'd like.

done.

@ltratt ltratt added this pull request to the merge queue Mar 28, 2024
Merged via the queue into ykjit:main with commit 51b872e Mar 28, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants