| Summary: | elwidth and indirection: two vectors, one width | ||
|---|---|---|---|
| Product: | Libre-SOC's first SoC | Reporter: | Alexandre Oliva <oliva> |
| Component: | Specification | Assignee: | Luke Kenneth Casson Leighton <lkcl> |
| Status: | CONFIRMED --- | ||
| Severity: | enhancement | CC: | libre-soc-isa, programmerjake |
| Priority: | --- | ||
| Version: | unspecified | ||
| Hardware: | PC | ||
| OS: | Other | ||
| NLnet milestone: | --- | total budget (EUR) for completion of task and all subtasks: | 0 |
| budget (EUR) for this task, excluding subtasks' budget: | 0 | parent task for budget allocation: | |
| child tasks for budget allocation: | The table of payments (in EUR) for this task; TOML format: | ||
| Bug Depends on: | |||
| Bug Blocks: | 213 | ||
|
Description
Alexandre Oliva
2021-01-07 14:36:17 GMT
I've just realized that the phrase "two vectors" in the subject may be both inaccurate and misleading. so, to try to be abundantly clear, I'm mainly talking about the (potential) vector of addresses, and the (potential) vector of objects it/they refer to, NOT about the vector register that will hold the loaded values. also, I am mostly sure that in the end only one of the (potential) vectors ends up being an actual vector, though subvl>1 might actually turn out to make both of them actual vectors. it also occurs to me now to wonder now whether there is a any case (or way to express) that both are scalars, as in, load this single value from memory, and then place it in all elements of the destination vector. (In reply to Alexandre Oliva from comment #1) > I've just realized that the phrase "two vectors" in the subject may be both > inaccurate and misleading. > > so, to try to be abundantly clear, I'm mainly talking about the (potential) > vector of addresses, and the (potential) vector of objects it/they refer to, > NOT about the vector register that will hold the loaded values. you are therefore probably talking about indexed mode. i removed indexed mode when illustrating the pseudocode for you because you asked about what is termed "unit stride" mode. > also, I am mostly sure that in the end only one of the (potential) vectors > ends up being an actual vector, though subvl>1 might actually turn out to > make both of them actual vectors. remember SUBVL is effectively simply a multiplier (num actual elements VL*SUBVL) and that SV is never actually switched off: scalars are just "when SUBVL=1 and VL=1" > it also occurs to me now to wonder now whether there is a any case (or way > to express) that both are scalars, as in, load this single value from > memory, and then place it in all elements of the destination vector. yyyepp. that's standard twin predication VSPLAT behaviour on top of a LDST "thing". although i think i see where you're going with this: i will have to check. idea: re-purpose the 2 bits from src width to specify mode: * unit strided * element strided * indexed * RESERVED two modes: * imm(r) - straight load * r(r) - INDEXED load so, for `ld reg, imm(reg)`, the src elwidth specifies: 0 -- unit stride -- loads from reg + imm + load_size * element_index 1 -- strided with stride of imm -- loads from reg + imm * element_index written: ld reg, (reg), stride=imm 2, 3 -- reserved -- maybe split imm bits between offset and stride? written: ld reg, offset_imm(reg), stride=stride_imm for `ld reg, (base_reg + index_reg)`, the src elwidth specifies the elwidth of index_reg, base_reg is always 64-bit. similarly for store. (In reply to Jacob Lifshay from comment #6) > so, for `ld reg, imm(reg)`, the src elwidth specifies: > 0 -- unit stride -- loads from reg + imm + load_size * element_index > 1 -- strided with stride of imm -- loads from reg + imm * element_index > written: ld reg, (reg), stride=imm > 2, 3 -- reserved -- maybe split imm bits between offset and stride? mmm there's only 16 bits (signed/unsigned), not too keen on limiting expectations (and altering compiler from scalar behaviour) RVV sets an "ordered/unordered" mode, which is interesting. other options: select to use the *dest* elwidth as the unit stride multiplier. this will give some weird overlaps when using e.g. ld (64 bit) with dest elwidth=8, and some stranger overlaps for ST. also, it turns out that when RA is vectorised, unit stride is absurd nonsense. EA = iregs[RA+i] + i*imm naah. so i disable unit stride there and make it just: EA = iregs[RA+i] + i*imm this leaves the "mode" bits doing nothing. what to do there? can we do anything with the 2 bits? put them back to src elwidth? > written: ld reg, offset_imm(reg), stride=stride_imm > > for `ld reg, (base_reg + index_reg)`, the src elwidth specifies the elwidth > of index_reg, base_reg is always 64-bit. yes, this is just necessary. get_polymorphic_reg(RA, elwidth=64, i) rather than elwidth=op_wid. > similarly for store. yes. it's the EA (effective address) question is, does "mode" do anything useful? 2 bits, 4 options... i'm honestly not thinking of anything that really stands out except perhaps ordered/unordered hmmm (In reply to Luke Kenneth Casson Leighton from comment #7) > RVV sets an "ordered/unordered" mode, which is interesting. and also breaks the expectation of compliance with scalar "Program Order". so, that's out. after some thought, the only place i'm seeing it necessary to add a different mode is on the immediate-version, when the source RA is scalar. funnily enough this meshes with the fail-first idea which we saw back in bug #561 there's enough bits there to do strange things. just needs properly going through it. arg the entire table "mode" makes no sense. reduce on LD/ST? err... the whole thing needs shuffling. |