Bug 726

Summary: Additional core_stop check after Execute breaks single-stepping
Product: Libre-SOC's first SoC Reporter: Cesar Strauss <cestrauss>
Component: Source CodeAssignee: Cesar Strauss <cestrauss>
Status: CONFIRMED ---    
Severity: major CC: libre-soc-bugs, lkcl
Priority: High    
Version: unspecified   
Hardware: PC   
OS: Linux   
URL: https://libre-soc.org/irclog/latest.log.html#t2021-10-12T18:33:53
NLnet milestone: --- total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0 parent task for budget allocation:
child tasks for budget allocation: The table of payments (in EUR) for this task; TOML format:
Bug Depends on:    
Bug Blocks: 737    

Description Cesar Strauss 2021-10-12 22:34:51 BST
Executing:

1) python ~/src/soc/src/soc/simple/issuer_verilog.py --disable-svp64 --debug=dmi ~/src/soc/src/soc/litex/florent/libresoc/libresoc.v

2) python ~/src/soc/src/soc/litex/florent/sim.py --debug --variant=standard

... simulates the libre-soc core, with an embedded FSM single-stepping it,  controlled by DMI.

Right now, one every two DMI single-step commands is not actually executing, deterministically.

Since we may want to stop the core in the middle of a VL loop, I have put another core stop check after Execute. Together with the check before Fetch, that's two core stop checks in a row.

What I didn't anticipate was core_stop being pulsed low, for single-step. As core_stop immediately goes high, the second check before Fetch catches it, and doesn't resume execution.

Unfortunately it seems likely that this bug ended up on the chip. The additional core_stop check after Execute was not conditional on --svp64.

These are the tasks as I see it:

1) Make a test-case that catches this regression
2) Fix the FSM to avoid the issue
3) Document the present behavior of the test chip
4) Develop and test mitigations for testing the chip

In principle, running two DMI single step commands in a row should work around this problem on the chip.

A side effect is that, after randomly stopping the core, the PC read by DMI may or may not point to the next instruction, depending whether the last executed instruction updated the PC, and it stopped on the check after Execution.

Too bad about the chip. Let's hope the workaround actually works in practice, and doesn't impact testing by much. Sorry about this.
Comment 1 Cesar Strauss 2021-12-23 11:57:03 GMT
FSM fixed in https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=a1639b39fcd13094c9469ce677f0265fd8d0fea2

Single-stepping into a VL loop no longer works, but at least DMI single-step works again, as intended, for regular cases.

Please check. Sorry for the inconvenience.
Comment 2 Luke Kenneth Casson Leighton 2021-12-25 03:00:54 GMT
works great cesar, i was able to get a verilator dump
of all instructions and regs, and tracked down thw
anomaly compared to microwatt mmu. thank you