Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[otbn,dv] Refactor FSM states in otbn simulator #23966

Merged
merged 10 commits into from
Jul 10, 2024

Conversation

rswarbrick
Copy link
Contributor

These changes are to get the otbn_escalate test working properly: there was a rather "bodged" model of how secure wipe should work and it didn't match the RTL (or make sense!)

The important change is the penultimate one, which splits the "wiping" state into two pieces. This means that we correctly track "waiting for seed" before "performing the wipe", which could end up being tracked backwards in some situations before.

But this doubles the number of "wiping" states and (many years ago) I unwisely differentiated between WIPING_GOOD and WIPING_BAD to distinguish whether we expected to lock when the wipe was complete. Rather than adding two more states, I've merged those two states back together (in the "Merge WIPING_GOOD, WIPING_BAD" commit) so the net change is zero!

Fixes #20835 (phew!)

@rswarbrick rswarbrick added Component:DV DV issue: testbench, test case, etc. IP:otbn labels Jul 9, 2024
@rswarbrick rswarbrick requested a review from a team as a code owner July 9, 2024 10:30
@rswarbrick rswarbrick requested review from hcallahan-lowrisc and removed request for a team July 9, 2024 10:30
@rswarbrick rswarbrick force-pushed the otbn-escalate-tidyup branch from 39ed6b3 to 33097f5 Compare July 9, 2024 10:50
@rswarbrick rswarbrick force-pushed the otbn-escalate-tidyup branch 2 times, most recently from 23124b9 to e89ff65 Compare July 9, 2024 16:43
The existing code had assumptions about how the state FSM works that
we don't want to be true. Avoid the assumptions by just warping to
"after secure wipe" and then stopping when we start the final wipe.

Signed-off-by: Rupert Swarbrick <[email protected]>
@rswarbrick rswarbrick force-pushed the otbn-escalate-tidyup branch from e89ff65 to 9e87591 Compare July 10, 2024 10:37
@rswarbrick
Copy link
Contributor Author

The latest force-push should sort out CI. It turns out that the previous version of my tweaking of standalonesim.py seeded URND on every cycle (with the same seed). It turns out that there are some crypto tests that fail if you have a constant random number generator! Should now be fixed.

Copy link
Contributor

@hcallahan-lowrisc hcallahan-lowrisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks reasonable to me, so I'm going to approve it. However, I'm not familiar enough with the system to confirm the new model matches the spec/intent. Given that this is about closing test corner-cases against the implementation which has been reviewed, discussed and documented, I think it's probably fine to merge.
Just some comment nits. This looked very tricky to get right! Nice job @rswarbrick :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Commit message for 06c9386

is_good = self.state.get_fsm_state() == FsmState.WIPING_GOOD
locking = self.state.rma_req == LcTx.ON or not is_good

# If there is ann RMA request, make sure we're in the WIPING_BAD state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. ann

Comment on lines 432 to 433
# an RMA request. If there is an invalid RMA signal or we were
# in the WIPING_BAD state (so had the locking signal set) then
# we should lock. Otherwise, jump to IDLE.
# in the going to lock after wipe (so had the locking signal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. grammar here is a bit off

Comment on lines 57 to 59
state (ending in updating the STATUS register to show we're done). The
difference we then goes back to IDLE or go to LOCKED (depending on the
lock_after_wipe flag).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. grammar

self.wipe_cycles = -1

# This controls which state we move to when the next secure wipe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... ends?

# set lock_after_wipe (since we're going to lock when we're done)
if self.state.rma_req == LcTx.ON and is_good:
self.state.set_fsm_state(FsmState.WIPING)
# If there is ann RMA request, ensure lock_after_wipe is set (since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. ann

This was added in commit a491a03. I'm not completely convinced that
the change was correct, and it isn't really explained in the commit
message. (And it did break something else for me). I will remove the
short-circuit for now. If it turns out to matter, I'll put it back in
a follow-up commit, with extra notes that explain what it's doing.

Signed-off-by: Rupert Swarbrick <[email protected]>
No functional change, but this silences a warning from mypy.

Signed-off-by: Rupert Swarbrick <[email protected]>
No functional change, but this avoids some mypy warnings.

Signed-off-by: Rupert Swarbrick <[email protected]>
This is to correctly handle a situation where we decide to jump to the
locked state in the middle of a secure wipe. Before this change, we
would print out

 ..., U, U, U, U
  (jump to locked state)
 V, STALL, STALL, STALL, ...

and the trace checker would get upset that the model sent a V (meaning
that a wipe had completed) and then a STALL without any corresponding
event from the RTL side.

With this change, we get ..., U, U, U, U, STALL, STALL, ... and avoid
the error.

Signed-off-by: Rupert Swarbrick <[email protected]>
A bogus RMA request only matters at specific states in the start/stop
controller FSM. Change the model so that it similarly only looks for
bad values at those times. Annoyingly, this takes effect "just after"
we finish the secure wipe, so we need to model it with a new
delayed_lock variable.

Signed-off-by: Rupert Swarbrick <[email protected]>
I split these apart when I first handled secure wipe. It seemed like a
good idea at the time! But it's not so great, especially if I want to
split out the "waiting for URND seed" state (and don't want to
duplicate that). Handle the state we're wiping towards with a separate
flag.

Signed-off-by: Rupert Swarbrick <[email protected]>
@rswarbrick rswarbrick force-pushed the otbn-escalate-tidyup branch from 9e87591 to 1e99ed8 Compare July 10, 2024 14:15
@rswarbrick
Copy link
Contributor Author

Thanks for the careful review, and for spotting the slightly spotty grammar from me!

@rswarbrick rswarbrick merged commit e3d1738 into lowRISC:master Jul 10, 2024
2 of 8 checks passed
@rswarbrick rswarbrick deleted the otbn-escalate-tidyup branch July 10, 2024 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:DV DV issue: testbench, test case, etc. IP:otbn
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[otbn] Correct model for handling staggered rma_req_i transitions
2 participants