| Summary: | add SV variant of fcvt to deal with elwidth differences in OpenPOWER FP scalar formats | ||
|---|---|---|---|
| Product: | Libre-SOC's first SoC | Reporter: | Luke Kenneth Casson Leighton <lkcl> |
| Component: | Specification | Assignee: | Luke Kenneth Casson Leighton <lkcl> |
| Status: | CONFIRMED --- | ||
| Severity: | enhancement | CC: | libre-soc-isa, programmerjake |
| Priority: | --- | ||
| Version: | unspecified | ||
| Hardware: | Other | ||
| OS: | Linux | ||
| See Also: | https://bugs.libre-soc.org/show_bug.cgi?id=560 | ||
| NLnet milestone: | --- | total budget (EUR) for completion of task and all subtasks: | 0 |
| budget (EUR) for this task, excluding subtasks' budget: | 0 | parent task for budget allocation: | |
| child tasks for budget allocation: | The table of payments (in EUR) for this task; TOML format: | ||
| Bug Depends on: | |||
| Bug Blocks: | 213 | ||
|
Description
Luke Kenneth Casson Leighton
2020-12-31 16:54:36 GMT
xscvdphp p429 v3.0B format of instructions around p548. here is the location where the special-casing should be performed, jacob, but not by way of being part of the SV loop, but by this particular operation explicitly being encoded and defined as: * input formats (src) are DEFINED as being OpenPOWER bit-spread at the src elwidth (defaults to FP64) * output formats (dest) are DEFINED as being "compacted and in the sensible sane way". and vice-versa. these both would be a null-operation (fmv) when srcwid == destwid. it *might* be possible, with some careful analysis, to allow for fmv itself to perform this conversion process. fmr p150 v3.0B 4.6.5
my feeling is that it would be reasonable to have these perform fcvt between the src elwidth and dest elwidth, such that followup FP operations that were also elwidth overridden had ("understood") the exact same FP format.
which brings us to an interesting point: what the heck does running single-precision FP ops on elwidth=32 even mean??
single precision FP Ops on elwidth=default means "do the op @ FP32 but distribute the bits across FP64"
my feeling is that this behaviour should be preserved at lower elwidth.
i.e. "do the op @ FP16 but distribute the bits across FP32".
i.e. single precision ops is redefined to be "do the op at half the precision"
fascinatingly if this is followed and the dest elwidth is *also* FP16 then there is a way to get faster computation even when the src elwidth is FP32 formatted.
|