Bug 1078

Summary: svp64 counting sort demo
Product: Libre-SOC's first SoC Reporter: Luke Kenneth Casson Leighton <lkcl>
Component: DocumentationAssignee: Luke Kenneth Casson Leighton <lkcl>
Status: CONFIRMED ---    
Severity: enhancement CC: libre-soc-bugs
Priority: ---    
Version: unspecified   
Hardware: Other   
OS: Linux   
NLnet milestone: --- total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0 parent task for budget allocation:
child tasks for budget allocation: The table of payments (in EUR) for this task; TOML format:
Bug Depends on:    
Bug Blocks: 953    

Description Luke Kenneth Casson Leighton 2023-05-05 11:52:36 BST
https://en.m.wikipedia.org/wiki/Counting_sort#Pseudocode

function CountingSort(input, k)
    
    count ← array of k + 1 zeros
    output ← array of same length as input
    
    for i = 0 to length(input) - 1 do
        j = key(input[i])
        count[j] = count[j] + 1

    for i = 1 to k do
        count[i] = count[i] + count[i - 1]

    for i = length(input) - 1 down to 0 do
        j = key(input[i])
        count[j] = count[j] - 1
        output[count[j]] = input[i]

    return output
Comment 1 Luke Kenneth Casson Leighton 2023-05-05 12:07:12 BST
(In reply to Luke Kenneth Casson Leighton from comment #0)

>     for i = 0 to length(input) - 1 do
>         j = key(input[i])
>         count[j] = count[j] + 1

    setvl k
    svindex on keys
    sv.addi *count, *count, 1 # Indexed on count

>     for i = 1 to k do
>         count[i] = count[i] + count[i - 1]

    setvl k-1
    sv.add/mr *count+1, *count+1, *count
    (or use parallel prefix sum)

>     for i = length(input) - 1 down to 0 do
>         j = key(input[i])
>         count[j] = count[j] - 1
>         output[count[j]] = input[i]

VF mode, 2 stages:

    setvl len(input)
    svindex on keys
    sv.addi/mrr *count, *count, -1 # Indexed on count
    sv.addi/mrr *countcopy, *count # Indexed only on count

2nd stage (HF mode):

    svindex on countcopy:
    sv.addi *output, *input, 0 # Index countcopy on output

first step is doing a histogram. next is cumulative sum of histogram
counts.  then find unique areas. finally copy which has the fully
sorted order.

this algorithm critically relies on Dependency Hazards within the
hardware, to not mess up for example multiple REMAP-Indexed atomic
increments.
Comment 2 Luke Kenneth Casson Leighton 2023-05-05 15:54:03 BST
# python implementation
from typing import List

def get_max(arr: List[int]) -> int:
	mx = arr[0]
	for i in range(1, len(arr)):
		if arr[i] > mx:
			mx = arr[i]
	return mx

def count_sort(arr: List[int], exp: int):
	output = [0] * len(arr)
	count = [0] * 10

	# Store count of occurrences in count[]
	for i in range(len(arr)):
		count[(arr[i] // exp) % 10] += 1

	# Change count[i] so that count[i] now contains actual
	# position of this digit in output[]
	for i in range(1, 10):
		count[i] += count[i - 1]

	# Build the output array
	for i in range(len(arr) - 1, -1, -1):
		output[count[(arr[i] // exp) % 10] - 1] = arr[i]
		count[(arr[i] // exp) % 10] -= 1

	# Copy the output array to arr[], so that arr[] now
	# contains sorted numbers according to current digit
	for i in range(len(arr)):
		arr[i] = output[i]


def parallel_count_sort(arr: List[int]):
	m = get_max(arr)

	# Do counting sort for every digit. Note that instead
	# of passing digit number, exp is passed. exp is 10 ^ i
	# where i is current digit number
	exp = 1
	while m // exp > 0:
		for i in range(len(arr)):
			count_sort(arr, exp)
		exp *= 10

# Driver program to test above functions
def main():
	arr = [170, 45, 75, 90, 802, 24, 2, 66]
	print("Array before sorting:")
	print(arr)

	parallel_count_sort(arr)

	print("Array after sorting:")
	print(arr)

if __name__ == "__main__":
	main()

# This code is contributed by ksam24000

https://summerofhpc.prace-ri.eu/fastest-sorting-algorithm-for-distributed-systems-parallel-radix-sort-difficulty-medium/

https://dirtyhandscoding.github.io/posts/vectorizing-small-fixed-size-sort.html