Bug 707

Summary: PartitionedSignal limited Cat function needed
Product: Libre-SOC's first SoC Reporter: Luke Kenneth Casson Leighton <lkcl>
Component: Source CodeAssignee: Luke Kenneth Casson Leighton <lkcl>
Status: RESOLVED FIXED    
Severity: enhancement CC: libre-soc-bugs
Priority: ---    
Version: unspecified   
Hardware: Other   
OS: Linux   
URL: https://libre-soc.org/3d_gpu/architecture/dynamic_simd/cat
See Also: https://bugs.libre-soc.org/show_bug.cgi?id=458
https://bugs.libre-soc.org/show_bug.cgi?id=115
NLnet milestone: NLnet.2019.02.012 total budget (EUR) for completion of task and all subtasks: 250
budget (EUR) for this task, excluding subtasks' budget: 250 parent task for budget allocation: 132
child tasks for budget allocation: The table of payments (in EUR) for this task; TOML format:
[lkcl] amount = 250 submitted = 2021-12-09 paid = 2021-12-09
Bug Depends on:    
Bug Blocks: 132    

Description Luke Kenneth Casson Leighton 2021-09-23 20:50:31 BST
a SIMD-aware Cat function is needed which can cope with
concatenation of PartitionedSignals together yet creates
the right output regardless of partition bits at runtime

PartitionedSignal:

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/partsig.py;hb=HEAD
Comment 1 Luke Kenneth Casson Leighton 2021-09-23 22:41:46 BST
i went over the cases (see URL at URL field on wiki)
and worked out that as long as the inputs arw all
PartitionedSignals that a SIMD Cat() is possible.

what is *not* possible is to mix in non-Partitioned
with Partitioned Signals, because without subdivisions
the lengths vary in non-proportional ways.
Comment 2 Luke Kenneth Casson Leighton 2021-09-24 01:12:16 BST
looking at the tables created in the URL wiki page, the algorithm appears to be:

m.Switch()
for pbits cases: 0b000 to 0b111
  output = []
  # set up some yielders which will retain where they each got to
  # then when called below in the inner nested loop they give
  # the relevant sequential chunk
  yielders = [Yielder(a), Yielder(b), ....]
  runlist = split pbits into runs of zeros
  for y in yielders: # for each signal a b c d ...
     for i in runlist: # for each partition
        for _ in range(i)+1: # for the length of each partition
            thing = yield from y # grab sequential chunks
            output.append(thing)
  with m.Case(pbits):
     comb += out.eq(Cat(*output)

where Yielder() is a function that yields one partition
at a time from the PartitionedSignal.

another way to do this is just to have a list of
indices which get incremented and explicitly select
the partition data explicitly.
Comment 3 Luke Kenneth Casson Leighton 2021-09-24 20:01:24 BST
(In reply to Luke Kenneth Casson Leighton from comment #2)

> where Yielder() is a function that yields one partition
> at a time from the PartitionedSignal.

drat. i may have gotten confused how to use yield
 
> another way to do this is just to have a list of
> indices which get incremented and explicitly select
> the partition data explicitly.

i went this route instead.