| Summary: | MultiCompUnit wrmask relies on sync ok | ||
|---|---|---|---|
| Product: | Libre-SOC's second ASIC | Reporter: | Luke Kenneth Casson Leighton <lkcl> |
| Component: | Milestones | Assignee: | Luke Kenneth Casson Leighton <lkcl> |
| Status: | CONFIRMED --- | ||
| Severity: | enhancement | CC: | cestrauss, libre-soc-bugs |
| Priority: | --- | ||
| Version: | unspecified | ||
| Hardware: | Other | ||
| OS: | Linux | ||
| See Also: | https://bugs.libre-soc.org/show_bug.cgi?id=737 | ||
| NLnet milestone: | --- | total budget (EUR) for completion of task and all subtasks: | 0 |
| budget (EUR) for this task, excluding subtasks' budget: | 0 | parent task for budget allocation: | |
| child tasks for budget allocation: | The table of payments (in EUR) for this task; TOML format: | ||
| Bug Depends on: | |||
| Bug Blocks: | 690 | ||
|
Description
Luke Kenneth Casson Leighton
2021-11-06 12:26:37 GMT
On https://bugs.libre-soc.org/show_bug.cgi?id=336#c75, I said: "It seems that MultiCompUnit depends on the ALU *.ok output fields being held valid long enough into the write phase. In the test ALU, I tried holding *.ok valid just for the duration of the ALU valid_o, and it didn't work. I ended up instead directly decoding the input operation, combinatorially, into the *.ok, so they do stay valid during the entire instruction cycle." Maybe related? http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-November/004086.html https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=809cf2faa4450901779045cfaa89e69f70ed9f42 (In reply to Cesar Strauss from comment #1) > On https://bugs.libre-soc.org/show_bug.cgi?id=336#c75, I said: > > "It seems that MultiCompUnit depends on the ALU *.ok output fields being > held valid long enough into the write phase. In the test ALU, I tried > holding *.ok valid just for the duration of the ALU valid_o, and it didn't > work. I ended up instead directly decoding the input operation, > combinatorially, into the *.ok, so they do stay valid during the entire > instruction cycle." > > Maybe related? sounds right. the issue is that wrmask goes directly into the write release, and it should have triggered the relevant latches at (and only at) the time that the ok occurs, which, of course, is when the n.i_valid occurs. that the ok has to be set and stay set *after* i_valid is a serious problem as expected, wrmask which comes from all the "ok" signals which are only supposed to be valid when n.i_valid is set, is used on the write request latches: 256 # dest operand latch (not using issue_i) 257 m.d.sync += req_l.s.eq(alu_pulsem & self.wrmask) 258 m.d.sync += req_l.r.eq(reset_w | prev_wr_go) however this does not look right: 361 # write-release gated by busy and by shadow (and write-mask) 362 brd = Repl(self.busy_o & self.shadown_i, self.n_dst) 363 m.d.comb += self.wr.rel_o.eq(req_l.q & brd & self.wrmask) wrmask gating with the wr.rel_o output is most likely the bug, it says that wrmask has to be sustained (sync'd) and it is redundant anyway. removing these seems to be ok:
--- a/src/soc/experiment/compalu_multi.py
+++ b/src/soc/experiment/compalu_multi.py
@@ -215,11 +215,9 @@ class MultiCompUnit(RegSpecALUAPI, Elaboratable):
# is enough, when combined with when read-phase is done (rst_l.q)
wr_any = Signal(reset_less=True)
req_done = Signal(reset_less=True)
- m.d.comb += self.done_o.eq(self.busy_o &
- ~((self.wr.rel_o & ~self.wrmask).bool()))
+ m.d.comb += self.done_o.eq(self.busy_o & ~(self.wr.rel_o).bool())
m.d.comb += wr_any.eq(self.wr.go_i.bool() | prev_wr_go.bool())
- m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready &
- ((req_l.q & self.wrmask) == 0))
+ m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready & (req_l.q == 0))
# argh, complicated hack: if there are no regs to write,
# instead of waiting for regs that are never going to happen,
# we indicate "done" when the ALU is "done"
@@ -360,7 +358,7 @@ class MultiCompUnit(RegSpecALUAPI, Elaboratable):
# write-release gated by busy and by shadow (and write-mask)
brd = Repl(self.busy_o & self.shadown_i, self.n_dst)
- m.d.comb += self.wr.rel_o.eq(req_l.q & brd & self.wrmask)
+ m.d.comb += self.wr.rel_o.eq(req_l.q & brd)
# output the data from the latch on go_write
for i in range(self.n_dst):
the reasons why it's ok is likely because:
* wrmask is the trigger (ANDed) into req_l setting)
* wr.rel_o ANDed with wrmask therefore should never be done
(and is redundant)
* almost all of the places removed are (wr.rel_o & wrmask)
the ones that i am leaving in are the "complicated hack":
with m.If((self.wrmask == 0) &
self.alu.n.i_ready & self.alu.n.o_valid & self.busy_o):
m.d.comb += req_done.eq(1)
this is a special indicator which detects when the incoming ready/valid
pulse is valid, which is supposed to be the only time when wrmask (the
Cat() of all data ok signals) is valid.
and when all data ok signals are zero, at the *exact* moment of the
incoming ready/valid, that indicates that there are no registers to
write, so there is no waiting.
now, on the face of it, it looks like wr_any covers this:
m.d.comb += wr_any.eq(self.wr.go_i.bool() | prev_wr_go.bool())
m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready & (req_l.q == 0))
why would you have to request req_done to cancel the request when, clearly,
wr.go_i should have no requests, right?
wrong.
the special "hack" covers the case where the *ALU* decides that it has
no need to write to registers.
this is for instances where XER or CR for example (XER.so in particular)
writing is determined *by the ALU* that it has no need to write to XER.so,
saving on a reg write.
but, it was not possible to determine that before giving the ALU the
*opportunity* to write to the XER.so (or other register)
commit 0b16b46191e24089916b902a1512902c87fd782
ha, that's good: that's fixed ReservationStations2, it can now set the data combinatorial on a single pulse rather than being forced to set it for multiple cycles. Cesar would you like to have a look, make sure you're happy? i am just adding no-wait states to ReservationStations2. |