Bug 323

Summary: create POWER9 MUL pipeline
Product: Libre-SOC's first SoC Reporter: Luke Kenneth Casson Leighton <lkcl>
Component: Source CodeAssignee: Jacob Lifshay <programmerjake>
Status: RESOLVED FIXED    
Severity: enhancement CC: libre-soc-bugs, programmerjake
Priority: ---    
Version: unspecified   
Hardware: Other   
OS: Linux   
See Also: https://bugs.libre-soc.org/show_bug.cgi?id=305
https://bugs.libre-soc.org/show_bug.cgi?id=462
NLnet milestone: NLNet.2019.10.043.Wishbone total budget (EUR) for completion of task and all subtasks: 750
budget (EUR) for this task, excluding subtasks' budget: 750 parent task for budget allocation: 383
child tasks for budget allocation: The table of payments (in EUR) for this task; TOML format:
"lkcl"={amount=250, paid=2020-08-21} "jacob"={amount=500, paid=2020-08-21}
Bug Depends on: 356, 432, 448, 419    
Bug Blocks: 383    

Description Luke Kenneth Casson Leighton 2020-05-19 13:01:46 BST
a MUL pipeline is needed similar to the other pipelines in soc.fu, covering MUL operations.

https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/fu/mul;hb=HEAD
Comment 1 Luke Kenneth Casson Leighton 2020-05-19 13:24:58 BST
there are actually two different types of MUL here.

* VA Form - 3 int in, no carry/overflow
* X Form - usual style just like ALU/Logical

my feelings are mixed as that is a lot of ports if they are combined. still, actuslly, after some thought it is the same (after combining) port allocation as Shift.


# Multiply-Add High Doubleword VA-Form

VA-Form

* maddhd RT,RA.RB,RC

    prod[0:127] <- (RA) * (RB)
    sum[0:127] <- prod + EXTS(RC)
    RT <- sum[0:63]

Special Registers Altered:

    None
Comment 2 Luke Kenneth Casson Leighton 2020-05-20 01:35:55 BST
https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=a60febdeb1c572a4b85b410c6519383fc581732d

i moved mul operations over to a MUL Function Unit.  the unit test,
test_pipe_caller.py, when cookie-cut copied over, should then be changed:

                    fn_unit = yield pdecode2.e.fn_unit
                    self.assertEqual(fn_unit, Function.SHIFT_ROT.value)

to:

                    fn_unit = yield pdecode2.e.fn_unit
                    self.assertEqual(fn_unit, Function.MUL.value)

really we should look at some point at deriving a class to contain
the common code from all these tests, soc.fn.*.test.test_pipe_caller.py
Comment 3 Luke Kenneth Casson Leighton 2020-05-27 23:21:25 BST
from microwatt: how to set up the inputs to the mul pipeline.  this can go in main_stage.py when calling the mul unit:



if e_in.is_32bit = '1' then
    if e_in.is_signed = '1' then
	x_to_multiply.data1 <= (others => a_in(31));
	x_to_multiply.data1(31 downto 0) <= a_in(31 downto 0);
	x_to_multiply.data2 <= (others => b_in(31));
	x_to_multiply.data2(31 downto 0) <= b_in(31 downto 0);
    else
	x_to_multiply.data1 <= '0' & x"00000000" & a_in(31 downto 0);
	x_to_multiply.data2 <= '0' & x"00000000" & b_in(31 downto 0);
else
    if e_in.is_signed = '1' then
	x_to_multiply.data1 <= a_in(63) & a_in;
	x_to_multiply.data2 <= b_in(63) & b_in;
    else
	x_to_multiply.data1 <= '0' & a_in;
	x_to_multiply.data2 <= '0' & b_in;
Comment 4 Luke Kenneth Casson Leighton 2020-07-06 19:33:33 BST
i made a start on this, no multi stage, just to get at least something movibg forward.

immediately found an issue with the simulator pseudocode.  mulli operands are supposed to be signed and it alters the output considerably.

this will need alteration of the pseudocode, even to the extent of creating a special MULS function.
Comment 5 Luke Kenneth Casson Leighton 2020-07-09 10:54:30 BST
commit 512e2d72912ba57913ab1b1297a085d5fae67181 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Jul 9 10:52:46 2020 +0100

    add new stages etc. to get multiply working without xer_ca

removing xer_ca from the DIV and MUl pipelines (both on input and
output) needs a bit of tweaking.

it's important because unnecessary registers being read/written to
creates dependencies that create chaining and prevent opportunities
for parallelism.
Comment 6 Luke Kenneth Casson Leighton 2020-07-10 22:32:26 BST
hmm to match the exact behaviour of IBM's POWER9 core it is necessary to modify the pseudocode of divhwu and divhw to return the 32 bits of the product mapped *twice*.

this is exactly what microwatt does.

the second modification needed is going to be in creating a variable named overflow in the pseudocode and returning it.

the microwatt test is quite neat: hi bits are both all non zero and not all 1s.

this can be easily expressed in the pseudocode.
Comment 7 Luke Kenneth Casson Leighton 2020-07-25 22:28:04 BST
ha, hilarious

    overflow <- ((prod[0:32] != 0x0_0000_0000) &
                 (prod[0:32] != 0x1_ffff_ffff))

that's in hexadecimal, which is 36 bits long, not 33.  so the
pseudocode rightly complains.

i changed it to [0]*33 and [1]*33 and that works.
Comment 8 Luke Kenneth Casson Leighton 2020-08-18 12:17:55 BST
jacob EUR 500 lkcl 250 on this one i feel is reasonable.  MAC TODO, tests ok, proof still needed however is separate.