https://zipcpu.com/blog/2017/08/14/strategies-for-pipelining.html use of STB/ACK results in an extra clock cycle's delay (2 cycles total) between stages. this needs to be dealt with and preferably removed entirely.