| Summary: | design and discuss user-tag flags in wishbone to provide phase 1 / 2 "speculative" memory accesses | ||
|---|---|---|---|
| Product: | Libre-SOC's first SoC | Reporter: | Luke Kenneth Casson Leighton <lkcl> |
| Component: | Source Code | Assignee: | Luke Kenneth Casson Leighton <lkcl> |
| Status: | CONFIRMED --- | ||
| Severity: | enhancement | CC: | libre-soc-bugs, yimmanuel3 |
| Priority: | --- | ||
| Version: | unspecified | ||
| Hardware: | PC | ||
| OS: | Mac OS | ||
| See Also: |
https://bugs.libre-soc.org/show_bug.cgi?id=393 https://bugs.libre-soc.org/show_bug.cgi?id=401 https://bugs.libre-soc.org/show_bug.cgi?id=410 |
||
| NLnet milestone: | --- | total budget (EUR) for completion of task and all subtasks: | 0 |
| budget (EUR) for this task, excluding subtasks' budget: | 0 | parent task for budget allocation: | |
| child tasks for budget allocation: | The table of payments (in EUR) for this task; TOML format: | ||
| Bug Depends on: | |||
| Bug Blocks: | 383 | ||
|
Description
Luke Kenneth Casson Leighton
2020-06-21 15:52:16 BST
This makes sense. I went through the LDST docs on the wiki yesterday too and learned as much. Wishbone would have to be the very bottom of the stack after we have some LDST operation that we know **must be executed. Clearly, wishbone is inappropriate for L0. (In reply to Luke Kenneth Casson Leighton from comment #0) > the requirements are - and this is not optional - that memory requests be > subdivided into two phases: > > 1) checking whether the request *CAN* be completed - WITHOUT EXCEPTIONS - if > it were to be permitted to proceed > > 2) allowing the memory request to proceed. There are some peripherals where they only error after proceeding to the point where it's impossible to cancel a request, so, our memory request state diagram needs to properly handle that, without just hard-locking the CPU or creating an inprecise interrupt. This would involve waiting until non-speculative for non-cachable memory addresses and potentially serializing operations. Cachable memory that is known to be in the cache can be speculatively read and non-speculatively written in parallel, no serialization required (unless using memory fences/atomic ops). Cachable memory that is in the cache is also known to not cause memory exceptions (assuming MMU translation and checking has already been done and ignoring ECC cache memory failures). (In reply to Yehowshua from comment #1) > This makes sense. > I went through the LDST docs on the wiki yesterday too and learned as much. i put the bit about contracts here: https://libre-soc.org/3d_gpu/architecture/6600scoreboard/discussion/ > Wishbone would have to be the very bottom of the stack after we have some > LDST operation that we know **must be executed. *deep breath* - i thought so, too. unfortunately, the discrepancy between the two types of contracts "here's my offer, take it or leave it" and "here's an offer, please take all the time you need to think about it" are so fundamentally and diametrically incompatible that we can't even use wishbone - as-is at the lowest level! or.. we can... as long as that usage is absolutely guaranteed one hundred percent NEVER to fail or raise any kind of error. in other words, you may *only* use "take-it-or-leave-it" contracts (buses such as Wishbone) for the **TAKE-IT** part, because the actual Contract of Sale clearly states that things have moved *into* the "TAKE IT" phase. > Clearly, wishbone is inappropriate for L0. well... an *augmented* version of wishbone is appropriate (one that obeys the standard "Contract of Sale" outlined above) or, we can use standard-Wishbone for "take it" (i.e. if is guaranteed that there will be no errors. we will still actually need to keep that error capability, however if such an error does occur (at the low levels) its status is raised to "catastrophic contract violation" and we halt the processor or fire a "severe NMI hard-fault" trap condition. (In reply to Jacob Lifshay from comment #2) > (In reply to Luke Kenneth Casson Leighton from comment #0) > > the requirements are - and this is not optional - that memory requests be > > subdivided into two phases: > > > > 1) checking whether the request *CAN* be completed - WITHOUT EXCEPTIONS - if > > it were to be permitted to proceed > > > > 2) allowing the memory request to proceed. > > There are some peripherals where they only error after proceeding to the > point where it's impossible to cancel a request, so, our memory request > state diagram needs to properly handle that, without just hard-locking the > CPU or creating an inprecise interrupt. yes. POWER architecture recognises that these peripherals exist, and puts them into the "atomic" category. there's a section on them, somewhere. this in turn holds up (entirely) all subsequently-issued LD/STs even from exiting anything beyond the "GO_ADDR" phase (the computation of the Effective Address). > This would involve waiting until non-speculative for non-cachable memory > addresses and potentially serializing operations. correct. once the (effectively atomic) LD/ST had proceeded past its "take-it-or-leave-it" contract, further "speculative" contracts may proceed in parallel. (see https://libre-soc.org/3d_gpu/architecture/6600scoreboard/discussion/ for explanation of the contract terminology) i'm reading the wb4 spec and table in section 3.1.6 shows the 4 types of userdefined signals permitted to ne added. ir, more to the pount, if added rgey must be "tagged" in the datasheet and must also respect the timing protocol associated with that tag. however none of these 4 tag types perfectly fit the "shadow" system aka "standard contract of sale". we may have to do this: * define a "cycle" tagged signal that indicates that the bus is to follow "shadow" protocol. this is raised and geld for the whole CYC_O * at that point, the slave can assume that all operations are implicitly under "shadow" conditions and that it must wait for "success or fail". * the address will be sent as normal for a read * however the slave MUST wait for the master to raise a DATAO tag (despite this being a read) of EITHER success or fail (GO_DIE). i am inclined to recommend that the slave be required to raise STALL_I at this point, until either success or fail is raised. btw that fail is synonymous with RST. it is the same thing. however i do not know at this point if is a bit drastic to do a full RST. alternatively we could simply specify that if cyc is dropped when STALL_I is raised this is equivalent to "GO_DIE". success on the other hand is simple enough. remember - irritatingly - we cannot pass shadow itself through because it does not fit *any* of the 4 tag types. unless of course we simply define that it is permitted to be. this would be easier and fit better. more thought required. right. another thought occurred to me. 1. peripherals have to be done as "take it or leave it" style wishbone access. 2. main memory (DRAM) also falls onto this category. note: that's not *cached* memory, it's *actual* memory (via SDRAM wishbone controller or LiteDRAM etc) 3. it is only the *processor* that needs to perform these speculative style "house contract of sale" requests. 4. therefore we *are* actually free and at liberty to design and use an internal bus architecture, which L0, L1, L2 and TLB and MMU understand, that respects the "house contract of sale" interface, this being an internal protocol. 5. however when interfacing to *peripherals* we must treat them as atomic and can use the take-it-or-leave-it protocol, falling back to single blocking operations and thus safely use wishbone. 6. as far as memory (DRAM) is concerned, as long as *batches* are respected (batches of LD requests that do not overlap with batches of STs) and once we have determined that the addresses of all batches are valid these LD-only or ST-only can be done in any order at any width. in addition: given that we are only doing a single core we have only one access route to memory to worry about. we are also not going to put VM in... yet. now, the discerning factor which tells us the difference between memory and peripherals is: the address. and it is the address that we need to check first at the "house contract Phase 1". this is incredibly simple: * is address in range of real DRAM, yes or no. if yes, we ASSUME, reasonably, that when it proceeds to Phase 2 it will succeed. after that point we *CAN* in fact use minerva for accessing DRAM because it is guaranteed to succeed. errors however are promoted to "catastrophic". for peripherals, these fall back to atomic blocking operations so we can *still* use minerva LoadStoreInterface however errors are straight exceptions rather than catastrophic. for peripherals the L1 cache must also be bypassed because you have to actually do the read and actually do the write. this is slow and is what DMA is for, but hey. |