Pletter (or other compression tools) performance

بواسطة albs_br

Champion (405)

صورة albs_br

21-07-2022, 22:22

Hi guys,
is there some information on the performance of Pletter (or other compression tools) decompressing data?
Specifically, decompressing .SCA data (bitmap for Screen 11).
I'm currently storing the raw .SCA file and copying it to VRAM to do vertical scrool, 256 bytes (one line) at a time.

Is it feasible to decompress it and spit it out to VRAM using little CPU time?

Decompress all the image to RAM before gameplay starts is not an option, as the image is huge (more than 1000 lines = around 256kb).

Thanks.

Login أوregister لوضع تعليقاتك

بواسطة Metalion

Paragon (1565)

صورة Metalion

22-07-2022, 11:40

I did once a comparison on all Z80 compression tools, but it was a virtual one. Meaning I was only comparing theoritical values on decompression code size, compression ratio and decompression speed. The best of them all was zx0. Pletter was in 45th position (there was a lot of tools variations, each of them taking a position). But, of course, performance largely depends on your specific need, and the data you're compressing. I would still do a test with zx0, though.

Decompressing directly the image to VRAM is feasible, but not all tools are offering this possibility (specifically the code to do it). I know zx0 does, I've seen the VRAM version in GitHub. The problem is that good compression tools use a "data string dictionnary", which is stored in the decompressed data as it goes along. Meaning that if you store the data in VRAM, it will have to read back data in VRAM to complete decompression. Which in turn will slow the whole process.

In conclusion, it's feasible, but you might have to adapt the code to do it, and it will be sensibly slower than a RAM decompression.

بواسطة Grauw

Ascended (10623)

صورة Grauw

22-07-2022, 11:53

Metalion wrote:

The problem is that good compression tools use a "data string dictionnary", which is stored in the decompressed data as it goes along. Meaning that if you store the data in VRAM, it will have to read back data in VRAM to complete decompression. Which in turn will slow the whole process.

You can keep a mirror in RAM to speed this up which only needs memory equal to the dictionary size. The extra performance cost per byte would be only the cost of an OUT instruction, potentially even less since the algorithm can be optimised to use an aligned circular buffer for the dictionary… Especially if dictionary size is 256 bytes it could speed up offset calculations for back references a fair bit. So I don’t think it’s a given that it’s going to be (much) slower.

بواسطة Amaury Carvalho

Resident (34)

صورة Amaury Carvalho

22-07-2022, 20:37

When creating the MSXBAS2ROM compiler I chose to use pletter to compress the image files when they are loaded via the BLOAD command on the MSX BASIC code.

This compression is done in blocks of 256 bytes of the raw image, that is saved each one inside the compiled ROM file in a way that facilitates the decompression in RAM of these blocks and later copying one by one to VRAM.

You can get an idea of the load time when unpacking via pletter format by looking at the DEMO5 from this link. The time to load the image in VRAM is faster than BLOAD's used by ROM BASIC, but the delay is still noticeable.

I think, however, that a direct assembly program can speed up a little more this load time via pletter, but not by a direct decompression byte to byte to VRAM. The tests that I had made before gave me the result that a decompression of blocks in RAM (and after that copy it to VRAM) is better to speed up this process.

بواسطة santiontanon

Paragon (1661)

صورة santiontanon

23-07-2022, 10:43

About the "is there some information on the performance of Pletter (or other compression tools) decompressing data?", there have been many, many comparisons of this online in different forums. Just linking a couple:
- A very comprehensive one: https://github.com/uniabis/z80depacker
- A more visual one: https://www.cpcwiki.eu/forum/programming/new-cruncher-zx0/ms...

بواسطة albs_br

Champion (405)

صورة albs_br

25-07-2022, 13:23

Thanks for all help.
For my specific case (SC 11 bitmap data) I have one unused bit (bit 3 means YJK or palette, but I never use palette), so I can use this bit as flag for repetition in a simple RLE encoding.

Code would be like this (untested):

; Input
;   HL: pointer to encoded data
	ld	b, 0 			; 256 repetitions
	ld	c, PORT

.init_1:
	ld	a, 0000 1000b		; mask for the repetition flag
.loop:
	and	(hl)
	jp	nz, .decodeRLE
	outi
	djnz	.loop
	jp 	.exit

.decodeRLE:
	ld	a, (hl)		; read byte that will be repeated
	and	1111 0111b	; clear repetition flag
	ld	e, a
	inc	hl
	ld	d, (hl)		; number of repetitions
	; b = b - d
	ld	a, b
	sub	d
.loop_decodeRLE:
	out	c, e
	dec	d
	jp	nz, .loop_decodeRLE
	
	inc	hl
	jp	.init_1

بواسطة Metalion

Paragon (1565)

صورة Metalion

25-07-2022, 14:10

santiontanon wrote:

- A very comprehensive one: https://github.com/uniabis/z80depacker

It's actually the one I used.
Here are the top 20 of my weighted comparison.