| Summary: | svp64 counting sort demo | ||
|---|---|---|---|
| Product: | Libre-SOC's first SoC | Reporter: | Luke Kenneth Casson Leighton <lkcl> |
| Component: | Documentation | Assignee: | Luke Kenneth Casson Leighton <lkcl> |
| Status: | CONFIRMED --- | ||
| Severity: | enhancement | CC: | libre-soc-bugs |
| Priority: | --- | ||
| Version: | unspecified | ||
| Hardware: | Other | ||
| OS: | Linux | ||
| NLnet milestone: | --- | total budget (EUR) for completion of task and all subtasks: | 0 |
| budget (EUR) for this task, excluding subtasks' budget: | 0 | parent task for budget allocation: | |
| child tasks for budget allocation: | The table of payments (in EUR) for this task; TOML format: | ||
| Bug Depends on: | |||
| Bug Blocks: | 953 | ||
|
Description
Luke Kenneth Casson Leighton
2023-05-05 11:52:36 BST
(In reply to Luke Kenneth Casson Leighton from comment #0) > for i = 0 to length(input) - 1 do > j = key(input[i]) > count[j] = count[j] + 1 setvl k svindex on keys sv.addi *count, *count, 1 # Indexed on count > for i = 1 to k do > count[i] = count[i] + count[i - 1] setvl k-1 sv.add/mr *count+1, *count+1, *count (or use parallel prefix sum) > for i = length(input) - 1 down to 0 do > j = key(input[i]) > count[j] = count[j] - 1 > output[count[j]] = input[i] VF mode, 2 stages: setvl len(input) svindex on keys sv.addi/mrr *count, *count, -1 # Indexed on count sv.addi/mrr *countcopy, *count # Indexed only on count 2nd stage (HF mode): svindex on countcopy: sv.addi *output, *input, 0 # Index countcopy on output first step is doing a histogram. next is cumulative sum of histogram counts. then find unique areas. finally copy which has the fully sorted order. this algorithm critically relies on Dependency Hazards within the hardware, to not mess up for example multiple REMAP-Indexed atomic increments. # python implementation
from typing import List
def get_max(arr: List[int]) -> int:
mx = arr[0]
for i in range(1, len(arr)):
if arr[i] > mx:
mx = arr[i]
return mx
def count_sort(arr: List[int], exp: int):
output = [0] * len(arr)
count = [0] * 10
# Store count of occurrences in count[]
for i in range(len(arr)):
count[(arr[i] // exp) % 10] += 1
# Change count[i] so that count[i] now contains actual
# position of this digit in output[]
for i in range(1, 10):
count[i] += count[i - 1]
# Build the output array
for i in range(len(arr) - 1, -1, -1):
output[count[(arr[i] // exp) % 10] - 1] = arr[i]
count[(arr[i] // exp) % 10] -= 1
# Copy the output array to arr[], so that arr[] now
# contains sorted numbers according to current digit
for i in range(len(arr)):
arr[i] = output[i]
def parallel_count_sort(arr: List[int]):
m = get_max(arr)
# Do counting sort for every digit. Note that instead
# of passing digit number, exp is passed. exp is 10 ^ i
# where i is current digit number
exp = 1
while m // exp > 0:
for i in range(len(arr)):
count_sort(arr, exp)
exp *= 10
# Driver program to test above functions
def main():
arr = [170, 45, 75, 90, 802, 24, 2, 66]
print("Array before sorting:")
print(arr)
parallel_count_sort(arr)
print("Array after sorting:")
print(arr)
if __name__ == "__main__":
main()
# This code is contributed by ksam24000
https://summerofhpc.prace-ri.eu/fastest-sorting-algorithm-for-distributed-systems-parallel-radix-sort-difficulty-medium/
https://dirtyhandscoding.github.io/posts/vectorizing-small-fixed-size-sort.html
|