Bug 270

Summary: investigate nmigen clock gating
Product: Libre-SOC's first SoC Reporter: Luke Kenneth Casson Leighton <lkcl>
Component: Source CodeAssignee: Luke Kenneth Casson Leighton <lkcl>
Status: CONFIRMED ---    
Severity: enhancement CC: libre-soc-bugs, staf
Priority: ---    
Version: unspecified   
Hardware: PC   
OS: Linux   
NLnet milestone: --- total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0 parent task for budget allocation:
child tasks for budget allocation: The table of payments (in EUR) for this task; TOML format:

Comment 1 Luke Kenneth Casson Leighton 2020-03-28 14:34:31 GMT
> The principle is that you save power by not clocking the parts of the circuit
> that don't have to do any computing. I think this could be a more
> general way to only enable the stages in your pipeline who actually 
> are doing computation.

ok so if i understand this correctly:

* the clock still runs at 1600mhz
* the clock runs a cyclic shift-register of length equal to the
  number of stages, at 1600 mhz.
* only every *alternate* one of those elements in the shift register
  is enabled (or, if you want full speed, all of them). 
* through EnableInserter each stage is clocked by a *different* bit
  in the shifted-register

> That said I think this feature does not fit in the MVP scope of the October
> prototype so that chip should IMO not use clock gating nor the pass-through
> register feature from the original discussion. 

no, i agree, and, more to the point, we don't need it for the 180nm ASIC
(except perhaps to test the concept).

one thing that we have is, the use of OO python has the entirety of the
stages themselves *completely* separated firmly behind a general-purpose
API, where the construction of pipelines, from those stages, using entirely
different pipeline techniques, is *literally* a one-line change.

so we could conceivably do the *entire* suite of pipelines - convert them
to use this clock gating technique - *literally* in well under a day,
after first experimenting with EnableInserter and a quick and simple unit
test.

re-running the IEEE754 FP unit tests on the other hand... *sigh* :)
Comment 2 Staf Verhaegen 2020-03-28 16:20:53 GMT
(In reply to Luke Kenneth Casson Leighton from comment #1)
> > The principle is that you save power by not clocking the parts of the circuit
> > that don't have to do any computing. I think this could be a more
> > general way to only enable the stages in your pipeline who actually 
> > are doing computation.
> 
> ok so if i understand this correctly:
> 
> * the clock still runs at 1600mhz
> * the clock runs a cyclic shift-register of length equal to the
>   number of stages, at 1600 mhz.
> * only every *alternate* one of those elements in the shift register
>   is enabled (or, if you want full speed, all of them). 
> * through EnableInserter each stage is clocked by a *different* bit
>   in the shifted-register

Correct, the clock is the pipeline clock. In theory other parts of the CPU could for example run at half the clock frequency. This will then naturally automatically only committing a new operation every other cycle at maximum.

I did not test it but EnableInserter should work in simulation and FPGA. Depending on FPGA you likely won't see the full power improvements as I think that the enabling is implemented as an enable input to each FF and not with gating parts of the clock tree. It will still guarantee that the output of FFs don't change.
As said implementing clock gating for ASICs will not be a simple task.