Bug 965

Summary: implement chacha20
Product: Libre-SOC's first SoC Reporter: Luke Kenneth Casson Leighton <lkcl>
Component: Source CodeAssignee: Luke Kenneth Casson Leighton <lkcl>
Status: RESOLVED FIXED    
Severity: enhancement CC: libre-soc-bugs
Priority: ---    
Version: unspecified   
Hardware: Other   
OS: Linux   
See Also: https://bugs.libre-soc.org/show_bug.cgi?id=770
https://bugs.libre-soc.org/show_bug.cgi?id=969
https://bugs.libre-soc.org/show_bug.cgi?id=1157
NLnet milestone: --- total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0 parent task for budget allocation:
child tasks for budget allocation: The table of payments (in EUR) for this task; TOML format:
Bug Depends on:    
Bug Blocks: 1007    

Description Luke Kenneth Casson Leighton 2022-10-23 11:39:31 BST
vector-efficient implementation of chacha20 is needed
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_chacha20.py;hb=HEAD
Comment 1 Luke Kenneth Casson Leighton 2022-10-23 11:52:01 BST
getting very interesting.  svindex is successful in doing the inner rounds,
svstep for the round inner loop, CTR mode for the outer.

there is however an opportunity to reorder the access to elements such that
the parallelism originally intended for chacha20 by bernstein is possible,
and it involves 3D REMAP.

an attempt to only deploy 2D REMAP was not successful, due to the fact that
the Indices are set up to cover both round-groups (the straight group of
16 followed by the rotated group):

       CHACHA_QUARTER_ROUND(w[0], w[4], w[8], w[12]);
       CHACHA_QUARTER_ROUND(w[1], w[5], w[9], w[13]);
       CHACHA_QUARTER_ROUND(w[2], w[6], w[10], w[14]);
       CHACHA_QUARTER_ROUND(w[3], w[7], w[11], w[15]);
  
       CHACHA_QUARTER_ROUND(w[0], w[5], w[10], w[15]);
       CHACHA_QUARTER_ROUND(w[1], w[6], w[11], w[12]);
       CHACHA_QUARTER_ROUND(w[2], w[7], w[8], w[13]);
       CHACHA_QUARTER_ROUND(w[3], w[4], w[9], w[14]);

the ordering needed was initially believed to be 2D: cycling through
by row (y) before moving to the next column (x)

unfortunately it is necessary to stop half-way down the rows (y=0-3)
before moving on to the next column (x+=1), then after all columns
are done repeat the process with the 2nd group (y=4-7)

this is perfectly possible but requires a 3D version of svindex
and svshape2, which is too much.