Bug 879

Summary: Formal proof of soc.experiment.compalu_multi.MultiCompUnit needed
Product: Libre-SOC's first SoC Reporter: Cesar Strauss <cestrauss>
Component: Formal VerificationAssignee: Cesar Strauss <cestrauss>
Status: RESOLVED FIXED    
Severity: enhancement CC: libre-soc-bugs, lkcl
Priority: High    
Version: unspecified   
Hardware: PC   
OS: Linux   
NLnet milestone: NLNet.2019.10.032.Formal total budget (EUR) for completion of task and all subtasks: 2500
budget (EUR) for this task, excluding subtasks' budget: 2500 parent task for budget allocation: 197
child tasks for budget allocation: The table of payments (in EUR) for this task; TOML format:
cesar={amount=2500,submitted=2022-10-12,paid=2022-11-02}
Bug Depends on:    
Bug Blocks: 197    

Description Cesar Strauss 2022-07-01 14:10:27 BST
In short, MultiCompUnit:

1) stores an opcode from Issue, when not "busy", and "issue" is pulsed
2) signals "busy" high
3) fetches its operand(s), if any (which are not masked or zero) from the Scoreboard (REQ/REL protocol)
4) starts the ALU (ready/valid protocol), as soon as all inputs are available
5) captures result from ALU (again ready/valid)
5) sends the result(s) back to the Scoreboard (again REQ/REL)
6) pulses "done", and drops "busy"

Note that, if the conditions are right, many of the above can occur together, on a single cycle.

The formal proof involves ensuring that:
1) the ALU gets the right opcode from Issue
2) the ALU gets the right operands from the Scoreboard
3) the Scoreboard receives the right result from the ALU
4) no transactions are dropped or repeated
Comment 1 Cesar Strauss 2022-10-12 19:57:14 BST
The work done on the formal proof of MultiCompUnit is here:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/formal/proof_compalu_multi.py;h=2bab836ab8f2eda3778383ed0811303972250e3e;hb=099825f7d53fb64adb30e363fd359eda3d1f14cd

It did caught issues and subtle bugs, unseen so far, which managed to keep hidden from our current unit tests.

The most serious bug is improper assertion of rel_o, for a masked operand, when the previous instruction had zero_a or imm_ok. This is due to the src_l latch currently not being reset, on issue.

This particular sequence was not present on the unit tests. In fact, there were already tests, which tested each of zero_a, imm_ok, and masked operands. They were just not adjacent to each other... Sure enough, I moved the tests to be next to each other, and it did reproduce the bug!

I guess the unexpected operand read (which was thrown away) did not affect results, so far.

Another bug found is the req_l latch not being clear on system reset. Even if the holding register of SRLatch is really reset-less, it is still reset at startup, because the latch reset port has reset=1 (https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/latch.py;h=e2d7541d396fa7468668f4bd903346abd504a94f;hb=8aa377705c96f365606950c0fa142e40546e566b#l69). But, this only works if the latch reset port is driven by the sync domain. In case of req_l, it's driven by the comb domain, so the latch is not reset on system reset.

A minor issue is, on handshakes, to assume the other end only sends pulses when requested/allowed. For instance, MultiCompUnit is not resilient to:
1) Receiving a new issue pulse when busy is already high
2) Receiving a go signal when rel is low

If desired, just replace the use of pulses with the appropriate AND of the handshake signals.

All in all, I consider the formal verification work a success in catching subtle bugs in MultiCompUnit.
Comment 2 Luke Kenneth Casson Leighton 2022-10-12 20:22:54 BST
(In reply to Cesar Strauss from comment #1)

> All in all, I consider the formal verification work a success in catching
> subtle bugs in MultiCompUnit.

fantastic, cesar, we should definitely consider one for LDSTCompUnit,
in the next grant.