Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of branch prediction using a BTB #124

Closed

Conversation

danbone
Copy link
Contributor

@danbone danbone commented Nov 27, 2023

Attempting to add some branch prediction logic to Olympia. I started with a branch target buffer.

Plan is to add a simple predictor, and a return address stack next. As this will require some form of recovery following speculative updates. Forming a foundation for further branch prediction modelling.

Early branch misprediction recovery could be done at a later date, but will require some more complex mechanism for restoring rename map before restarting.

Few points of interest:

  • Adding flushing on misprediction has pushed the runtime up considerable, dhrystone takes 250s+ to simulate compared to 12secs on master. It's only doing 10k flushes, I'm surprised that it has had such an impact.
  • The BTB model is basically just a clone of SimpleCache2 but the address decoder was a protected const, and I needed to change it. Any suggestion on a better way to do this?
  • I intend to send all the flushed instructions to rename serially so to free up the renaming registers. This has an added bonus as it can be used to cancel the scoreboard callbacks. Unless there's a better suggestion?
  • Should I update the JSON instruction generator? Will require new taken/target fields
  • I want the branch prediction stuff encapsulated a bit more, I was thinking of having a new flush field in the instruction set on prediction, that can trigger flushes within the different modules. i.e. BTB miss could set flush to DECODE_FLUSH, branch misprediction sets EXECUTE_FLUSH. Would have to consider whether the instruction that triggers the flush is included or excluded from the flush. Thoughts?
  • Any ideas on what tests to add?

@klingaard
Copy link
Collaborator

This is awesome! Thanks for putting this together. I'll check it out in more detail in a bit. I did take a few minutes at lunch to address your first POI:

Adding flushing on misprediction has pushed the runtime up considerable, dhrystone takes 250s+

I ran your branch through Apple's Instruments and found that Rename::scheduleRenaming is taking a huge amount of time. I added an assert on the size of uop_queeu_regcount_ data structure to make sure it doesn't grow big. It does! I think this data structure is not being cleared out properly on a flush. @aarongchan you might need to add this assert to make sure folks update Rename on flush situations.

@klingaard
Copy link
Collaborator

Yeah, if you add (untested) this line to the bottom of Rename::handleFlush_ performance goes back to normal:

uop_queue_regcount_data_.clear();

@klingaard klingaard added the enhancement New feature or request label Nov 27, 2023
core/BTB.hpp Outdated Show resolved Hide resolved
core/BTB.hpp Outdated
Comment on lines 73 to 78
void reset(uint64_t addr)
{
setValid(true);
setAddr(addr);
resetLHistCounter_();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never been a fan of reset methods. Instead, you can do this after adding a new constructor that accepts an address:

   btb_entries_[idx] = BTBEntry(addr);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can do this (unless I'm not understanding something, fairly new to C++).

The problem is that we're using the Cache and BasicCacheSet as the underlying data structures. The BTB acts as a shim.

The Cache returns pointers to the BTBEntry objects stored within a vector inside BasicCacheSet, the user can then manipulate the object using the pointer, but there isn't a way to put a new object into the vector down in the Set class.

Is there a way I can use the constructor you suggested to copy into the pointer object on the RHS?

(I tried to experiment a bit, but couldn't get the syntax right - complains that the btb_entry is a pointer and if use new then I think you'd get a memory leak?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious on your experiment...

Anyway, this would work to "replace" or reset the object deep in the guts of the cache, me thinks (forgive the pseudo code):

auto entry_ptr = cache->getEntry();
*entry_ptr = BTBEntry();

core/BTB.hpp Outdated Show resolved Hide resolved
core/BTB.hpp Outdated Show resolved Hide resolved
core/BTB.hpp Outdated Show resolved Hide resolved
core/BTB.hpp Outdated Show resolved Hide resolved
core/Fetch.cpp Outdated Show resolved Hide resolved
core/Fetch.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the API, I suggest you don't use references as the parameters, but instead use pointers:

(const BTBEntry * line)

Reason for this, look at this example of buggy code:

    auto btb_entry = btb_->getEntry();
    doSomeStuffToUpdateEntry_(&btb_entry);

Does getEntry return reference or a copy? The auto might make a copy and the call to doSomeStuffToUpdateEntry_ will not update the original.

core/Rename.cpp Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the changes here reset the entirety of rename, preventing support for partial flushes. 🤔 -- would you be up to updating the flushing criteria to include those instructions being flushed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early branch misprediction recovery could be done at a later date, but will require some more complex mechanism for restoring rename map before restarting.

Ah, you already mentioned this...

@arupc
Copy link
Collaborator

arupc commented Nov 29, 2023

Thank you for initiating this.

To provide more history, per previous discussion, we intend to add branch prediction support via an API, along with an implementation of simple branch predictor. This would enable use of a simple predictor while allowing for implementation of other predictors. There's also an interest in re-using existing branch predictor implementations

Please see #1

I will start a discussion item on this topic. Let's collaborate via the discussion thread and review a proposal for branch prediction API

@danbone danbone mentioned this pull request Nov 29, 2023
@klingaard
Copy link
Collaborator

From our discussion in the SIG yesterday:

  • This is the first PR that actually hammers on the flushing mechanism (and actually found a bug).
  • While this doesn't use the proposed branch API from @arupc and team, it's not a show-stopper for moving forward with this PR
  • The flushing/redirect occurs in Decode, but only for a BTB miss (never seen before branch). When this occurs the entire pipe is flushed. Not sure how safe this is right. Might behoove us to actually implement a proper redirect from execution instead.
  • The PR is missing some fundamental testing. Specifically, we'd like to see a unit test just for the BTB as it stands so it can be grown over time. Also, augmentation of the JSON format might be handy to support taken/not taken branch behavior to test flushing, etc.

@danbone
Copy link
Contributor Author

danbone commented Dec 12, 2023

Updated branch to walk the ROB when restoring the rename map table following a flush, to fix the hack I did in the previous commit, and in anticipation of adding execute redirect on branch misprediction.

  • Created new FlushingCriteria object for determining which instructions to flush (inclusive/exclusive flush)
  • Remove code which set scoreboard bits on flush which then led to fixing flushing of instructions from the issue queues in the executepipe/lsu with outstanding scoreboard callbacks
  • Updated flush method in ROB to walk the reorder buffer backwards sending the flushed instructions to rename
  • Moved ROB allocation from dispatch to rename as I ran into a race condition where renamed instructions updated the map table, but then weren't included in the ROB walk because they were stalled in dispatch.

On the last point, moving ROB allocation to rename isn't the only solution, and even now it requires that the ROB is updated before the flush creating a constraint on the reorder buffer write port being scheduled on the PortUpdate phase with a 1 cycle delay.

  1. I thought about keeping a similar ROB structure within rename to ensure all instructions that update the rename table are captured.
  2. Also considered using the sparta::SharedData container around the rename map and cancelling NS on flush.
  3. Or keeping two versions of the map, the current one and another updated as instructions are written to the ROB.

I'm sure there are more (better) ideas out there..

@danbone
Copy link
Contributor Author

danbone commented Jan 3, 2024

Moving forward following the discussion in our last meeting on the 20th of December. I think we should separate out the changes needed for flushing. Then rebase the branch prediction API branch on top of it and close this PR

@danbone danbone closed this Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants