| Summary: | SVP64 demo / unit test of CRM data-dependent ffirst mode implementing insertion sort | ||
|---|---|---|---|
| Product: | Libre-SOC's first SoC | Reporter: | Luke Kenneth Casson Leighton <lkcl> |
| Component: | Source Code | Assignee: | Luke Kenneth Casson Leighton <lkcl> |
| Status: | CONFIRMED --- | ||
| Severity: | enhancement | CC: | libre-soc-bugs |
| Priority: | --- | ||
| Version: | unspecified | ||
| Hardware: | Other | ||
| OS: | Linux | ||
| See Also: |
https://bugs.libre-soc.org/show_bug.cgi?id=672 https://bugs.libre-soc.org/show_bug.cgi?id=676 https://bugs.libre-soc.org/show_bug.cgi?id=897 https://bugs.libre-soc.org/show_bug.cgi?id=222 |
||
| NLnet milestone: | NLnet.2022-08-051.OPF | total budget (EUR) for completion of task and all subtasks: | 2000 |
| budget (EUR) for this task, excluding subtasks' budget: | 2000 | parent task for budget allocation: | 953 |
| child tasks for budget allocation: | The table of payments (in EUR) for this task; TOML format: | ||
| Bug Depends on: | |||
| Bug Blocks: | 952, 953 | ||
sublists are small enough to do with insertionsort
medians can use a (new) treereduce mode, with a predicate
of 0b00100 or 1<<r3 r3=2
low/high can use Vector cmp and VCOMPRESS with CR.gt/lt
k can be computed from cntlz
sublists once calculated can be part of timsort.
def median_of_medians(A, i):
sublists = [A[j:j+5] for j in range(0, len(A), 5)]
medians = [sorted(sublist)[len(sublist)/2] for sublist in sublists]
if len(medians) <= 5:
pivot = sorted(medians)[len(medians)/2]
else:
pivot = median_of_medians(medians, len(medians)/2)
low = [j for j in A if j < pivot]
high = [j for j in A if j > pivot]
k = len(low)
if i < k:
return median_of_medians(low,i)
elif i > k:
return median_of_medians(high,i-k-1)
else: #pivot = k
return pivot
#Here are some example lists you can use to see how the algorithm works
#A = [1,2,3,4,5,1000,8,9,99]
#B = [1,2,3,4,5,6]
#print median_of_medians(A, 0) #should be 1
#print median_of_medians(A,7) #should be 99
#print median_of_medians(B,4) #should be 5
this should be possible to do with vectorised CMP creating
a Vector of CRs. followed by 3 CR-Predicated VCOMPRESSed
mvs, one with CR.eq, one with CR.gt, the other with CR.lt
from random import randint
def quicksort(array):
if len(array) < 2:
return array
low, same, high = [], [], []
pivot = array[randint(0, len(array) - 1)]
for item in array:
if item < pivot:
low.append(item)
elif item == pivot:
same.append(item)
elif item > pivot:
high.append(item)
return quicksort(low) + same + quicksort(high)
actual core of timsort looks very similar to a butterfly
schedule. note that min_run is picked to ensure merges are
as close to equal size power 2 as possible.
def timsort(array):
min_run = 32
n = len(array)
for i in range(0, n, min_run):
insertion_sort(array, i, min((i + min_run - 1), n - 1))
size = min_run
while size < n:
for start in range(0, n, size * 2):
midpoint = start + size - 1
end = min((start + size * 2 - 1), (n-1))
merged_array = merge(
left=array[start:midpoint + 1],
right=array[midpoint + 1:end + 1])
array[start:start + len(merged_array)] = merged_array
size *= 2
return array
changing milestone to match parent task for budget allocation inverting the order:
def insertion_sort(array):
lim = len(array)-1
for i in range(lim,-1,-1):
key_item = array[i]
j = i + 1
while j <= lim and array[j] > key_item:
array[j - 1] = array[j]
j += 1
array[j - 1] = key_item
return array
ctr = alen-1
li r10, 1 # prepare mask
sld r10, alen, r10
addi r10, r10, -1 # all 1s. must be better way
loop:
setvl r3, ctr
sv.mv/m=1<<r3 key, *array # get key item
sld r10, 1 # shift in another zero MSB
sv.cmp/ff=GT/m=~r10 *0, *array, key # stop cmp at 1st GT fail
sv.mv/m=GT *array-1, *array # after cmp and ffirst
getvl r3
sub r3, 1 # reduce by one
sv.mv/m=1<<r3 *array, key # put key into array
bc 16, loop # dec CTR, back around
(In reply to Luke Kenneth Casson Leighton from comment #0) > it should be possible to do this using 1<<r3 predicate > for selection and insertion, or with unary masks like in maxloc. see bug #676 |
it should be possible to do this using 1<<r3 predicate for selection and insertion, data-dependent failfirst to do the inner loop (Reverse REMAP schedule) and a compare xchg instruction. failfirst will drop out with VL set to the failed element, getvl can copy this into r3 and use 1<<r3 as the predicate to insert key_item. excluding LDs it should be possible to do in registers in about 10 instructions. def insertion_sort(array): for i in range(1, len(array)): key_item = array[i] j = i - 1 while j >= 0 and array[j] > key_item: array[j + 1] = array[j] j -= 1 array[j + 1] = key_item return array