| Summary: | parallel prefix sum polynomial interpolation | ||
|---|---|---|---|
| Product: | Libre-SOC's first SoC | Reporter: | Luke Kenneth Casson Leighton <lkcl> |
| Component: | Documentation | Assignee: | Luke Kenneth Casson Leighton <lkcl> |
| Status: | CONFIRMED --- | ||
| Severity: | enhancement | CC: | libre-soc-bugs, programmerjake |
| Priority: | High | ||
| Version: | unspecified | ||
| Hardware: | Other | ||
| OS: | Linux | ||
| NLnet milestone: | --- | total budget (EUR) for completion of task and all subtasks: | 0 |
| budget (EUR) for this task, excluding subtasks' budget: | 0 | parent task for budget allocation: | |
| child tasks for budget allocation: | The table of payments (in EUR) for this task; TOML format: | ||
| Bug Depends on: | |||
| Bug Blocks: | 953, 1019 | ||
|
Description
Luke Kenneth Casson Leighton
2023-05-19 09:47:04 BST
<markos_> I've done a similar method in the past <programmerjake> idk, but parallel prefix sum and parallel tree reduction can be used for polynomial evaluation: <programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x), op=mul)), op=add) == <programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4... <programmerjake> though it tends to have lower precision than using fma because of rounding before adding (In reply to Luke Kenneth Casson Leighton from comment #2) > <markos_> I've done a similar method in the past > <programmerjake> idk, but parallel prefix sum and parallel tree reduction > can be used for polynomial evaluation: > <programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x), > op=mul)), op=add) == > <programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4... > <programmerjake> though it tends to have lower precision than using fma > because of rounding before adding ok writing this out myself so that my brain absorbs it * start with: [1, x, x, x, x...] * perform prefix sum multiply (parallel is fine) [1, x, x^2, x^3, ....] * take vector of c [c0, c1, c2, c3, c4...] * straight multiply: [c0, c1*x, c2*x^2, ....] * perform reduction c0 + c1*x + c2*x^2 ... ok that is dead easy. so now what about parallel reduction using fmac? let us first try mapreduce on fmac: sv.madd/mr res, *c, *xv, res that looks perfectly fine, and can get accuracy from fmac. now parallel reduction, res would have to be a vector, but i don't believe it works, because *c (or *xv) would end up getting multiplied in more than once. unless done as Vertical-First. (In reply to Luke Kenneth Casson Leighton from comment #3) > (In reply to Luke Kenneth Casson Leighton from comment #2) > > <markos_> I've done a similar method in the past > > <programmerjake> idk, but parallel prefix sum and parallel tree reduction > > can be used for polynomial evaluation: > > <programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x), > > op=mul)), op=add) == > > <programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4... > > <programmerjake> though it tends to have lower precision than using fma > > because of rounding before adding > > ok writing this out myself so that my brain absorbs it > > * start with: > [1, x, x, x, x...] > * perform prefix sum multiply (parallel is fine) > [1, x, x^2, x^3, ....] > * take vector of c > [c0, c1, c2, c3, c4...] > * straight multiply: > [c0, c1*x, c2*x^2, ....] > * perform reduction > c0 + c1*x + c2*x^2 ... parallel reduction is fine here too -- reduce using fadd, though as noted it gets less accuracy than using fma (because fma doesn't round between the mul and add and because higher powers of x tend to give monomials (includes coefficients) that are smaller, so adding a smaller approximate value to a larger exact value (the coefficient) gives a value with higher relative precision). (edit: add forgotten parenthesis) (In reply to Jacob Lifshay from comment #4) > parallel reduction is fine here too -- reduce using fadd, though as noted it > gets less accuracy than using fma (because fma doesn't round between the mul > and add and because higher powers of x tend to give monomials (includes > coefficients) that are smaller, so adding a smaller approximate value to a > larger exact value (the coefficient) gives a value with higher relative > precision). sounds to me like /mrr (reverse-gear) would be sensible to use, at the end where the tinier values would accumulate with better accuracy. exactly the kind of thing to put in an app note. |