| Summary: | create twin-butterfly research page into DCT/FFT instructions | ||
|---|---|---|---|
| Product: | Libre-SOC's first SoC | Reporter: | Luke Kenneth Casson Leighton <lkcl> |
| Component: | Specification | Assignee: | Konstantinos Margaritis (markos) <konstantinos> |
| Status: | RESOLVED FIXED | ||
| Severity: | enhancement | CC: | ghostmansd, konstantinos, libre-soc-isa, programmerjake |
| Priority: | --- | ||
| Version: | unspecified | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| URL: | https://libre-soc.org/openpower/sv/twin_butterfly | ||
| See Also: | https://bugs.libre-soc.org/show_bug.cgi?id=1028 | ||
| NLnet milestone: | NLnet.2022-08-051.OPF | total budget (EUR) for completion of task and all subtasks: | 2500 |
| budget (EUR) for this task, excluding subtasks' budget: | 2500 | parent task for budget allocation: | 1011 |
| child tasks for budget allocation: | The table of payments (in EUR) for this task; TOML format: |
lkcl = { amount = 500, submitted = 2024-02-07, paid = 2024-02-29 }
markos = { amount = 1600, submitted = 2023-09-03, paid = 2023-09-15 }
[jacob]
amount = 400
submitted = 2024-02-09
paid = 2024-02-29
|
|
| Bug Depends on: | |||
| Bug Blocks: | 1076 | ||
|
Description
Luke Kenneth Casson Leighton
2023-04-27 16:38:41 BST
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=5b0a082545185799b7bf053374aa3b60117ef74b +| NN | RT | RA | RB | RC | sh 01 00 |0 | maddsubrs | BF-Form | current pseudocode is 5-in 2-out, since that's too much, i have an idea that might work for how to reduce that: https://libre-soc.org/irclog/%23libre-soc.2023-04-27.log.html#t2023-04-27T20:06:41 > idea: put the pair of coefficients and accumulated sums each in 1 reg with each value being the lower/upper half of a reg...this should reduce input/output regs to 4-in 1-out > idk if that'll fit the DCT pattern tho > this is kinda like how cdtbcd works where the upper and lower halves are independent > e.g. RT <- ((RT)[0:XLEN/2-1] + prod0) || ((RT)[XLEN/2:XLEN-1] + prod1) > that way if you set elwid=32 you get 2 16-bit results (In reply to Jacob Lifshay from comment #3) > current pseudocode is 5-in 2-out, since that's too much, i have an idea that > might work for how to reduce that: already sorted. copying the pre-established pattern for ffmadds it is RA that is used as the input-accumulator. this is the way twin-butterfly works RT = RA+RB*RC RS = RA-RB*RC where if you look at the unit test and also dig into the DCT Schedule you find RT=RA and RB=RS. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f8e2c0cb1467391aa7ae4b8b092c281ee2e16a7b + with m.If((major == 22) & xo6.matches( + '-01000', # maddsubrs + )): + comb += self.implicit_rs.eq(1) + comb += self.extend_rb_maxvl.eq(1) # extend RB this says "RB is the extended register offset by MAXVL", and detects the opcode 22 XO top 5 bits indicating maddsubrs. i have fdmadds down to 3 operands in the instruction encoding,
it is still 3-in 2-out, FRT is overwrite and FRS destination
is implicit.
DCT-Form
fdmadds FRT,FRA,FRB (Rc=0)
fdmadds. FRT,FRA,FRB (Rc=1)
Pseudo-code:
FRS <- FPADD32(FRT, FRB)
sub <- FPSUB32(FRT, FRB)
FRT <- FPMUL32(FRA, sub)
astonishingly this worked. so actually you should be able to
have a full 5-bits shift.
# 1.6.7.2 DCTI-FORM
|0 | 6 |11 |16 |21 |25 |31 |
| PO | RT | RA | RB | SH | XO | Rc |
yes like that :)
which, actually, means it can be added as a variant of A-Form
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD
210 # 1.6.17 A-FORM
211 |0 |6 |11 |16 |21 |26 |31 |
212 | PO | FRT | FRA | FRB | FRC | XO |Rc |
213 | PO | FRT | FRA | FRB | /// | XO |Rc |
214 | PO | FRT | FRA | /// | FRC | XO |Rc |
215 | PO | FRT | /// | FRB | /// | XO |Rc |
216 | PO | RT | RA | RB | BC | XO | /|
217
| PO | RT | RA | RB | SH | XO | Rc |
fits perfectly.
although it looks horrible can i suggest this instead?
prod1 <- MUL(RB, sum) # RB = c
prod2 <- MUL(RB, diff) # TODO: Pick high half?
res1 <- ROTL64(prod1, XLEN-SH)
res2 <- ROTL64(prod2, XLEN-SH)
==>
prod1 <- MUL(RB, sum) # RB = c
prod2 <- MUL(RB, diff) # TODO: Pick high half?
res1 <- prod1[XLEN/2-SH:XLEN-1-SH]
res2 <- prod2[XLEN/2-SH:XLEN-1-SH]
because the ROTL64 actually requires masking out of the top
LSBs.
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=9094d03d96dc474267c587fc94b8b5fdc8244227 + res1 <- ROTL64(prod1, XLEN-SH) + res2 <- ROTL64(prod2, XLEN-SH) ha, excellent. (In reply to Luke Kenneth Casson Leighton from comment #8) > https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff; > h=9094d03d96dc474267c587fc94b8b5fdc8244227 > > > + res1 <- ROTL64(prod1, XLEN-SH) > + res2 <- ROTL64(prod2, XLEN-SH) > > ha, excellent. oh hang on, no, that doesn't quite do it, the MSBs still get rotated into LSB positions. or err the the other way round. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6fe2b6ccc37181c0f416df3706110ea609377746 ffmadds is now down to 3 operands in the instruction form, it is still 3-in 2-out, it's just that it looks like an X-Form now (10-bit XO) which is great. First working prototype here: https://libre-soc.org/openpower/sv/twin_butterfly/ *** Bug 1028 has been marked as a duplicate of this bug. *** *** Bug 962 has been marked as a duplicate of this bug. *** konstantinos, question: should this be (a + b + 1) * c) >> N ? you see why i suggest that? it is to do with averaging a+b where otherwise it is a FLOOR situation. holy cow you're right. this ends up as one instruction. wow.
add 9,5,4
subf 5,5,4
mullw 9,9,6
mullw 5,5,6
addi 9,9,8192
addi 5,5,8192
srawi 9,9,14
srawi 5,5,14
Documentation updated in: https://libre-soc.org/openpower/sv/twin_butterfly/ |