Initial implementation of branch prediction using a BTB #124

danbone · 2023-11-27T18:32:45Z

Attempting to add some branch prediction logic to Olympia. I started with a branch target buffer.

Plan is to add a simple predictor, and a return address stack next. As this will require some form of recovery following speculative updates. Forming a foundation for further branch prediction modelling.

Early branch misprediction recovery could be done at a later date, but will require some more complex mechanism for restoring rename map before restarting.

Few points of interest:

Adding flushing on misprediction has pushed the runtime up considerable, dhrystone takes 250s+ to simulate compared to 12secs on master. It's only doing 10k flushes, I'm surprised that it has had such an impact.
The BTB model is basically just a clone of SimpleCache2 but the address decoder was a protected const, and I needed to change it. Any suggestion on a better way to do this?
I intend to send all the flushed instructions to rename serially so to free up the renaming registers. This has an added bonus as it can be used to cancel the scoreboard callbacks. Unless there's a better suggestion?
Should I update the JSON instruction generator? Will require new taken/target fields
I want the branch prediction stuff encapsulated a bit more, I was thinking of having a new flush field in the instruction set on prediction, that can trigger flushes within the different modules. i.e. BTB miss could set flush to DECODE_FLUSH, branch misprediction sets EXECUTE_FLUSH. Would have to consider whether the instruction that triggers the flush is included or excluded from the flush. Thoughts?
Any ideas on what tests to add?

klingaard · 2023-11-27T19:40:24Z

This is awesome! Thanks for putting this together. I'll check it out in more detail in a bit. I did take a few minutes at lunch to address your first POI:

Adding flushing on misprediction has pushed the runtime up considerable, dhrystone takes 250s+

I ran your branch through Apple's Instruments and found that Rename::scheduleRenaming is taking a huge amount of time. I added an assert on the size of uop_queeu_regcount_ data structure to make sure it doesn't grow big. It does! I think this data structure is not being cleared out properly on a flush. @aarongchan you might need to add this assert to make sure folks update Rename on flush situations.

klingaard · 2023-11-27T19:53:12Z

Yeah, if you add (untested) this line to the bottom of Rename::handleFlush_ performance goes back to normal:

uop_queue_regcount_data_.clear();

core/BTB.hpp

klingaard · 2023-11-27T20:06:44Z

core/BTB.hpp

+        void reset(uint64_t addr)
+        {
+            setValid(true);
+            setAddr(addr);
+            resetLHistCounter_();
+        }


I've never been a fan of reset methods. Instead, you can do this after adding a new constructor that accepts an address:

btb_entries_[idx] = BTBEntry(addr);

I'm not sure I can do this (unless I'm not understanding something, fairly new to C++).

The problem is that we're using the Cache and BasicCacheSet as the underlying data structures. The BTB acts as a shim.

The Cache returns pointers to the BTBEntry objects stored within a vector inside BasicCacheSet, the user can then manipulate the object using the pointer, but there isn't a way to put a new object into the vector down in the Set class.

Is there a way I can use the constructor you suggested to copy into the pointer object on the RHS?

(I tried to experiment a bit, but couldn't get the syntax right - complains that the btb_entry is a pointer and if use new then I think you'd get a memory leak?)

Curious on your experiment...

Anyway, this would work to "replace" or reset the object deep in the guts of the cache, me thinks (forgive the pseudo code):

auto entry_ptr = cache->getEntry(); *entry_ptr = BTBEntry();

core/BTB.hpp

core/Fetch.cpp

klingaard · 2023-11-27T20:27:13Z

core/BTB.hpp

For the API, I suggest you don't use references as the parameters, but instead use pointers:

(const BTBEntry * line)

Reason for this, look at this example of buggy code:

auto btb_entry = btb_->getEntry(); doSomeStuffToUpdateEntry_(&btb_entry);

Does getEntry return reference or a copy? The auto might make a copy and the call to doSomeStuffToUpdateEntry_ will not update the original.

klingaard · 2023-11-27T20:29:21Z

core/Rename.cpp

So the changes here reset the entirety of rename, preventing support for partial flushes. 🤔 -- would you be up to updating the flushing criteria to include those instructions being flushed?

Early branch misprediction recovery could be done at a later date, but will require some more complex mechanism for restoring rename map before restarting.

Ah, you already mentioned this...

arupc · 2023-11-29T08:51:09Z

Thank you for initiating this.

To provide more history, per previous discussion, we intend to add branch prediction support via an API, along with an implementation of simple branch predictor. This would enable use of a simple predictor while allowing for implementation of other predictors. There's also an interest in re-using existing branch predictor implementations

Please see #1

I will start a discussion item on this topic. Let's collaborate via the discussion thread and review a proposal for branch prediction API

klingaard · 2023-11-30T15:35:05Z

From our discussion in the SIG yesterday:

This is the first PR that actually hammers on the flushing mechanism (and actually found a bug).
While this doesn't use the proposed branch API from @arupc and team, it's not a show-stopper for moving forward with this PR
The flushing/redirect occurs in Decode, but only for a BTB miss (never seen before branch). When this occurs the entire pipe is flushed. Not sure how safe this is right. Might behoove us to actually implement a proper redirect from execution instead.
The PR is missing some fundamental testing. Specifically, we'd like to see a unit test just for the BTB as it stands so it can be grown over time. Also, augmentation of the JSON format might be handy to support taken/not taken branch behavior to test flushing, etc.

danbone · 2023-12-12T23:59:20Z

Updated branch to walk the ROB when restoring the rename map table following a flush, to fix the hack I did in the previous commit, and in anticipation of adding execute redirect on branch misprediction.

Created new FlushingCriteria object for determining which instructions to flush (inclusive/exclusive flush)
Remove code which set scoreboard bits on flush which then led to fixing flushing of instructions from the issue queues in the executepipe/lsu with outstanding scoreboard callbacks
Updated flush method in ROB to walk the reorder buffer backwards sending the flushed instructions to rename
Moved ROB allocation from dispatch to rename as I ran into a race condition where renamed instructions updated the map table, but then weren't included in the ROB walk because they were stalled in dispatch.

On the last point, moving ROB allocation to rename isn't the only solution, and even now it requires that the ROB is updated before the flush creating a constraint on the reorder buffer write port being scheduled on the PortUpdate phase with a 1 cycle delay.

I thought about keeping a similar ROB structure within rename to ensure all instructions that update the rename table are captured.
Also considered using the sparta::SharedData container around the rename map and cancelling NS on flush.
Or keeping two versions of the map, the current one and another updated as instructions are written to the ROB.

I'm sure there are more (better) ideas out there..

danbone · 2024-01-03T13:19:37Z

Moving forward following the discussion in our last meeting on the 20th of December. I think we should separate out the changes needed for flushing. Then rebase the branch prediction API branch on top of it and close this PR

Initial implementation of branch prediction using a BTB

ff61518

klingaard assigned danbone Nov 27, 2023

klingaard added the enhancement New feature or request label Nov 27, 2023

klingaard reviewed Nov 27, 2023

View reviewed changes

Daniel Bone added 2 commits November 28, 2023 09:15

Clear up uop_queue_regcount_data on flush

371f6ca

Clean up BTB changes

0820c37

danbone mentioned this pull request Nov 29, 2023

Add a branch predictor #1

Open

Walk ROB to recover rename map table after flush

46e2a78

Daniel Bone added 2 commits December 13, 2023 12:59

Instigate misprediction flush from execute

e9b584c

revert ROB back from Buffer to InstQueue

e6efd1f

danbone closed this Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of branch prediction using a BTB #124

Initial implementation of branch prediction using a BTB #124

danbone commented Nov 27, 2023 •

edited

Loading

klingaard commented Nov 27, 2023

klingaard commented Nov 27, 2023

klingaard Nov 27, 2023

danbone Nov 28, 2023

klingaard Nov 30, 2023

klingaard Nov 27, 2023

klingaard Nov 27, 2023

klingaard Nov 27, 2023

arupc commented Nov 29, 2023

klingaard commented Nov 30, 2023

danbone commented Dec 12, 2023 •

edited

Loading

danbone commented Jan 3, 2024 •

edited

Loading

Initial implementation of branch prediction using a BTB #124

Initial implementation of branch prediction using a BTB #124

Conversation

danbone commented Nov 27, 2023 • edited Loading

klingaard commented Nov 27, 2023

klingaard commented Nov 27, 2023

klingaard Nov 27, 2023

Choose a reason for hiding this comment

danbone Nov 28, 2023

Choose a reason for hiding this comment

klingaard Nov 30, 2023

Choose a reason for hiding this comment

klingaard Nov 27, 2023

Choose a reason for hiding this comment

klingaard Nov 27, 2023

Choose a reason for hiding this comment

klingaard Nov 27, 2023

Choose a reason for hiding this comment

arupc commented Nov 29, 2023

klingaard commented Nov 30, 2023

danbone commented Dec 12, 2023 • edited Loading

danbone commented Jan 3, 2024 • edited Loading

danbone commented Nov 27, 2023 •

edited

Loading

danbone commented Dec 12, 2023 •

edited

Loading

danbone commented Jan 3, 2024 •

edited

Loading