Gunzip for MSX

페이지 5/11
1 | 2 | 3 | 4 | | 6 | 7 | 8 | 9 | 10

By Louthrax

Prophet (2083)

Louthrax의 아바타

23-10-2015, 02:09

Fits in RAM. Works. To be released soon. Thanks Grauw Smile

By Grauw

Ascended (8457)

Grauw의 아바타

27-10-2015, 15:22

Louthrax: Super awesome! Smile

As tunzip 0.91 was recently released which fixes the bug I encountered, I’ve updated the test results in the second post.

I’ve also received some nice performance improvement suggestions from Wouter Vermaelen which I’ll try out soon.

By Louthrax

Prophet (2083)

Louthrax의 아바타

27-10-2015, 16:49

Hi Grauw,

Any idea if that new optimization will use more memory ? I have a little margin left in SofaUnZip but not so much !

By NYYRIKKI

Enlighted (5388)

NYYRIKKI의 아바타

27-10-2015, 17:10

This is somehow so wrong... First I need to wait for 18 years that someone would make a proper ZIP-extractor and then suddenly I need to select from two alternatives!

Well... the good thing is that now I can FINALLY update the downloads on my www-page to a format that causes less trouble for PC-users. (Oh, well... if I find inspiration to do something about the page some day.)

By Grauw

Ascended (8457)

Grauw의 아바타

27-10-2015, 17:49

Louthrax: I’ll do my best to keep memory usage at the same level, reduce it if possible.

NYYRIKKI: SofaUnzip shares the same high performance as gunzip, plus nice long file name and directory support, so you can guess my recommendation Smile.

By Prodatron

Paragon (1788)

Prodatron의 아바타

27-10-2015, 20:42

I just saw it today, this is really so cool! I love your way of structuring the code (again, as usual! Smile ) and optimization (concentrate on the real important stuff, but don't mess it all up for some nonsense cycles). Just great!
What I didn't get is, why you need to jump between these "readers"/"writers" all the time. Or in other words, why this has to be dynamic (and using IX, IY) all the time. Aren't these overall static structures, when deflating one file?

By Grauw

Ascended (8457)

Grauw의 아바타

27-10-2015, 22:18

Thanks for your kind words Smile.

It’s true, they are only allocated once and it could all be static for this particular use case, but I set up the code to be all dynamically allocated so it can be used as flexibly as possible. Deflate two files at once? Have your code in ROM? No problem. Besides, it’s much easier to go from there to static stuff, than the other way around. It’s the basic premise of how I nowadays write code in general, I think it makes the code more structured and easier to read and modify, and I optimised based on that architecture.

But IX/IY isn’t as slow as you’d imagine… they’re amazingly flexible, just like hl, so you can load directly from any index into any register, compare, inc/dec/shift straight to memory, etc. If I didn’t use them, not only would I have two less registers, but I’d also have to keep all the necessary data in the remaining registers and worry about preserving them all the time… Juggling registers and pushing/popping all the time isn’t cheap either, and the code would be a lot messier.

And with this approach, it’s easy for me to generate little snippets of code in the objects, as you can see in the reader/writer, which is a great way to speed things up. In the hottest loop, it actually only uses inc (ix + Reader.bufferPosition) and inc (iy + Writer.bufferPosition) which doubles as a boundary (zero) check. It would be slower if would do it without, and would modify more registers.

The Copy_IY method which is part of the second-hottest loop could maybe be written to be a little faster in isolation, but again it already modifies all registers so it’d be tricky to achieve because you just don’t have much left to work with, and all the code that calls it would also have to preserve the reader and writer positions, so…

Ten years ago I used to avoid IX/IY like the plague, because I had the misconception that they were slow. (They are, but they more than make up for it in flexibility.) I would just get stuck so often running out of registers, and adding offsets all over the place, it really was an inefficient use of my time, required lots of debugging. Nowadays it smells to me like premature optimisation, and not even guaranteed to be faster…

Long story short, I learned to love the index registers. In a way you can see the gunzip code base and its great performance results as proof that using objects and IX/IY to access them can result in really fast code! Smile

By Prodatron

Paragon (1788)

Prodatron의 아바타

28-10-2015, 00:03

Grauw, thanks a lot for this very detailed explanation!

I like to use IX/IY for such dynamic stuff a lot, so I absolutely agree. E.g. the object oriented GUI of SymbOS would be (nearly) impossible without IX/IY (somehow at least)

My only concern was, that if you really have a static structure - or let's say: only having one "instance" - using IX/IY is probably slower:

It's faster to use LD BC,(nnnn) [20]...
..compared to LD C,(IX+x):LD B,(IX+x+1) [38]

Now it's just interesting, if such an optimization is really changing the final result. Ok, and my problem is to replace the usage of the 2nd register set of the Z80 somehow, as it's reserved and not allowed to use in SymbOS Smile

By Louthrax

Prophet (2083)

Louthrax의 아바타

28-10-2015, 00:19

The Hitech C compiler (I think others too) also heavily relies on indexed registers for "stack management" (local variables and parameters) and structures. Looks like those registers have been designed with that in mind,and the performance is not so bad compared to what you would have to push / pop with normal registers.

I was also tempted to change Grauw's code into a "static" version. The only advantage would be the code size (ix/iy instructions are bigger compared to direct access), not so much the speed itself. But for now, I won't. Integrating Grauw's new optimizations will be much easier and safer Smile

By the way, the source code of SofaUnZip will be released too, along with the final 1.0 version.

By edoz

Prophet (2172)

edoz의 아바타

28-10-2015, 11:39

To technical for me! but very cool to read this post! Very cool to see such development.

페이지 5/11
1 | 2 | 3 | 4 | | 6 | 7 | 8 | 9 | 10