-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor IOStreamer and QSPI #700
base: main
Are you sure you want to change the base?
Refactor IOStreamer and QSPI #700
Conversation
Added support for additionally delaying when the sample is taken. Normally the samples are taken at a time when the input signals change that have been launched at the same time as i_en, however this allows us to take samples later, specified in number of sync clock cycles. Additionally for ratio=2 iostreamer we can also delay by half a clock cycle.
… DDR inputs" This reverts commit d1cc5d9.
This changes how the sdr input sampling test works. Before only positive edges on clk_out would signify to the testbench that the input has been sampled. The old behavior put a constraint on payload, that i_en must be high only for every second payload. This was less then ideal for developing stronger skid buffer tests. This change slows down the clk_out signal, and now any edge (positive or negative), means that we have sampled something. In the waves only the shape of clk_out changes, the rest of the signals stay the same. (i.e. the same values are sampled at the same times) The DDR input sample test does not require a similar change to it.
There was a corner-case in IOStreamer where a sample could be lost, if it arrives to the skid buffer on the same cycle as when another sample is removed from the skid buffer. This PR also adds many more testcases.
This better relays the intent that we want to sample the "old" value of the inputs in case they change at the same time as clk_out, and may work better if input_generator_tb changes in the future.
…of SimulatableDDRBuffer is glitchless
This change allows o_stream to take at least one transfer, as long as there are no sample transfers in flight. This will prevent deadlocks, for example if i_stream is implemented with something like: ```m.d.comb += ready.eq(valid)``` Which is valid according to stream rules. This commit also adds tests for this.
7250f61
to
9b22ff5
Compare
There is a slight change in behavior, because the "bypass" feature has been removed, CS clock pulses must now be at least 1 QPSI cycle long. `sample_delay_half_clocks` is now runtime-selectable. `max_sample_delay_half_clocks` is now a configuration option, for the IOStreamerTop to know how many resources to allocate. `min_divisor` is now a configuration option, that if set to > 0, allows IOStreamer to optimize away some uneeded resources. The bulk of the changes is about cutting up the functionality into smaller sub-components.
9b22ff5
to
a4f9696
Compare
TLDR: I think I was wrong about the folloing statement and I want to rework sample_delay-related resource optimizations, to make it possible to use smaller skid-buffer and less delayer shift register stages at slow clock speed.
I re-thought the statement above and I think I was wrong, I think an optimization that is a lot closer to what Cathrine originally described on discord is possible, without needing non-zero This can then optionally be further extended if we add a Now assuming
I think I'm tending towards it's worth it. Next time I'm looking at this I think I'll try giving this idea an implementation. |
Just to be clear, I still plan to re-do the infrastructure myself at some point. As long as you're fine with duplicated work, that's going to be OK. |
That wasn't clear, any way, thanks for clarifying. I'll set this to draft, feel free to reuse any of my code if you find it useful. |
The reason I'm saying that is because I'd like the implementation to be suitable to upstream to amaranth-stdio. Yours is very complex and has a lot of idiosyncrasies, which would be fine for a Glasgow applet but isn't going to cut it for a generic and reusable module that I expect everyone to use. I've scrolled through it and I don't feel that it is suitable to be moved to amaranth-stdio; I do appreciate that you've put an enormous amount of work into it and I would like to discuss the mechanism with you to make sure that whatever eventually ends up in amaranth-stdio is of high quality. |
This refactors the entire IOStreamer infrastructure and how it's used with QSPI.
Here's a rough sketch of what the structure looks like:
This PR is correct with respect to all of the pending bugs that the existing IOStreamer and QSPI controller has, that is:
In addition what's new in the PR:
sample_delay_half_clocks
is runtime-configurable, withmax_sample_delay_half_clocks
allows IOStreamer to configure the right amount of resourcesdivisor
must be known byIOStreamerTop
in order to make optimizations like that, so I have integrated IOStretcher into IOStreamerTop.divisor
is a runtime setting, so we cannot minimize logic based on its value, so I have added amin_divisor
configuration option.divisor
must always be>= min_divisor
. Themin_divisor
option can be used to optimize away skid buffer stages, and to optimize the shift register stages that delay samples from entering the skid buffer.ceil(max_latency / min_divisor)
, which can sometimes end up being as small as a single stage.meta
from entering the skid buffer, can be implemented more efficiently by using a counter. Unfortunately if the maximum latency is large, then the counter will only count up tomin_divisor
, and beyond that the rest of the latency will be implemented with shift registers. However if the maximum latency is small, The counter will suffice.min_divisor
configuration option, then I don't think that we can make any kind of optimization and we are forced to have max_latency shift register stages fori_en
/meta
and max_latency skid buffer stages as well.i_en
individually for each lane. And that means thati_en
can be set high only for one lane, so while on 2-to-1-lane conversion we always get pairs, the returning samples will not be paired anymore. One way we could handle that would be to have i_en be an attribute outside of lanes, or to have i_en be or-ed together, and overridden to be the same for all lanes, before 2-to-1 conversion, but this is undesireable.IO2LaneTo1Lane
converter adds extra information tometa
, that identifies which lane each transaction originally belonged to. Then then entire IOStreamerTop structure processes the transaction with the expandedmeta
field, and when samples come back to IOClockerDeframer, they have the corresponding fieldstag
andlast
.tag
will tell theIO1LaneTo2Lane
upconverter, to which lane each sample belongs tolast
will tell theIO1LaneTo2Lane
upconverter, if it needs to wait for another sample before assembling the 2-lane transaction and sending it to QSPIDeframer.last
is generated inIO2LaneTo1Lane
by looking at lane 1's i_en. Iflane[1].i_en
is high, thanlane[0].meta.last
is low, andlane[1].meta.last
is high, otherwiselane[0].meta.last
is high.