Show HN: Rapidgzip – Truly Parallel Gzip Decompression with 10 GB/s I have posted a much earlier version of this over a year ago [0]. Since then a lot has changed. Obviously, the name has changed. This happened for the paper publication [1]. I have also optimized the speed, integrated ISA-L for special cases, limited the compression-ratio-dependent maximum memory consumption, and finally added parallelized CRC32 computation, which adds ~5% overhead no matter the number of cores used. At this point, I am leaning towards calling it production-ready although there are still many ideas for improvements. Redoing the benchmarks of the older Show HN, would look like this: time pigz -d -c 4GiB-base64.gz | wc -c # real ~13.4 s -> ~320 MB/s time rapidgzip -d -c 4GiB-base64.gz | wc -c # real ~1.26 s -> ~3.4 GB/s However, at this point, the piping itself becomes a problem. Rapidgzip is actually slightly faster than cat when comparing the piped bandwidth! E.g., compare these additional benchmarks: time cat 4GiB-base64.gz | wc -c # real ~1.06 s -> ~3.1 GB/s time fcat 4GiB-base64.gz | wc -c # real ~0.41 s -> ~8.0 GB/s time rapidgzip -o /dev/null -d 4GiB-base64.gz # real ~0.68 s -> ~6.5 GB/s fcat is an alternative cat implementation that uses vmsplice to speed up piping. According to the ReadMe it currently is broken, but it works fine on my system and piping it to md5sum yields consistent results [2]. So, at this point, I/O and actually also allocations have become a limiting factor and if you want full speed, you would have to interface with the rapidgzip library interface directly (in C++ or via the Python bindings) and process the decompressed data in memory. The project ReadMe contains further benchmarks with Silesia and FASTQ data and scaling up to 128 cores, for which rapidgzip achieves 12 GB/s for Silesia and 24 GB/s when an index has been created with --export-index and is used with --import-index. It can also be tested with ratarmount 0.14.0, which now uses rapidgzip as a backend by default for .gz and .tar.gz files [3]. [0] https://ift.tt/wrWMnQ0 [1] https://ift.tt/rWf1C5G [2] https://ift.tt/NlQYxLV [3] https://ift.tt/lcP9hzD https://ift.tt/Gce1D8P September 3, 2023 at 10:29PM
Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation test link ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate another link velit esse cillum dolore eu fugiat nulla pariatur.
Sample Text
10 Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text.
0 Comments