SofaPak teaser

By Louthrax

Prophet (2084)

Louthrax's picture

11-10-2015, 22:25

I had several requests for "long file names" support in SofaRun, and this was one of my top-priority for the next release. I have been thinking about the best way to do that for a long time (including patching MSXDOS2 to retrieve the FAT16 long file names that are stored on the SDCard in the findfirst / findnext functions), and finally ended with this approach: having a "zip-like" tool, with :

  • Fast unpacking speed, so that you can unpak a file and play it in a decen time.
  • Long file names support.
  • Sub-directories support.

I'm almost done, here's a small teaser where I list, uncompress (using wildcards) Aleste 2 disk images from the a "TOSEC" disk-archive:

SofaPak teaser

Some technical details for those interested:

  • The packing method is the awesome "Bitbuster" one from TeamBomba.
  • The command line tool (SP.COM) works the same way on PC and MSX, and is written in pure generic C (except unpacking routines on MSX), so that will be easy to port to Mac/Linux.
  • 7zip-like syntax for the command line (a for adding, l for listing, e for extracting...).
  • Compression (with sub-directories) also works on MSX.
  • A 720KB disk image is uncompressed in +/- 60s on a normal MSX2.
  • File format is close from standard "zip", with all entries stored at the end. I was originally looking for a (7)zip-compatible format that could be un-compressed fast enough (and with memory contraints) on MSX, but I have not found it.

Any reactions or comments are welcomed.

Login or register to post comments

By Grauw

Ascended (8457)

Grauw's picture

11-10-2015, 23:16

Very interesting, I was already wondering how you would go about doing that… Too bad you can’t read the long filenames straight from disk, the directory entries should be there but I guess it’s hard to access them.

About the location of the file entry header, wouldn’t it be a bit faster to read if the header was at the start of the file? Because then the disk doesn’t have to seek. Afaik for zip they put it at the end to support appending archives etc., but I don’t think that’s a constraint you need to worry about.

Anyway I think I could use the same approach for VGMPlay. Once I’ve added VGZ support, the gzipped files should have a long filename in their header… Unfortunately, for VGM files the GD3 tag with all the author info etc. is at the end, so for gzipped files it means I’ll have to inflate the whole thing first before I can showing it to the user.

I’m implementing the gunzip support for VGMPlay in a separate tool first (it simplifies things a bit during development), the idea is also that it’d be reusable, but it’s not ready yet so it won’t be of any help to you right now. Also I don’t know how well it will perform yet… Although deflate is one of the faster compression formats on PC, so I do hope I can get it to decompress faster than the VGM file takes to play back so I can decompress on the fly. I don’t imagine it’ll be able to beat Bitbuster though. But anyway, maybe I’ll find some time in the near future to finish that code.

By Louthrax

Prophet (2084)

Louthrax's picture

12-10-2015, 00:04

Hi Grauw,

Reading long file names from disk would be something to add to Nextor. Maybe not a full support, just retrieving the long file name from an opened file or in the find first / find next functions. Doing that outside of the Nextor kernel requires to parse the "FAT16 structure", definitively too hard.

About the position of the file entry header (or footer), it only slows down floppies, but has no impact on SD Cards (maybe a bit on HDs). As you said, the benefit of putting this at the end is that it's easy to update an archive (add or delete file) : just chop and copy the footer in memory or disk, add compressed data, and copy updated footer back. I've not added that yet in SofaPack (the archive is rewritten completely at each time), but I think I prefer to have this option opened.

About the VGZ files: in the zip file format, I think there's room for variable-length "comments" or meta-data for each file. Not sure if gzip handles that. Of course those comments are lost if you uncompress it with gzip, but that could be an idea (duplicating the info you need there ?).

Talking about Bitbuster, it's using a simple LZSS/LZ77 compression with variable length and byte-aligned encoding. No Huffman/arithmetic/ whatever layers. The Z80 unpacking routine is amazingly short and fast, and the compression rate not so far from ZIP or PMARC, depending on the input files. Going for more sophisticated methods like those mentionned before will slow down a lot the unpacking. Also, Bitbuster does not use extra memory, except for an "output buffer" block (its size can be set at compression time with SP.COM).

Some benchmarks with PMEXT.COM and SP.COM using Aleste 2 disk 1 on turboR:

  • PMEXT.COM: compressed file size: 247,424 bytes, decompression time: 1min58s
  • SP.COM: compressed file size: 272,851 bytes, decompression time: 8s

PMEXT could certainly be speed-up by not using CP/M functions and better buffering, but the speed-difference will remain significative I think.

Do you have an idea of the max bitrate needed for VGMPlay ?

By Grauw

Ascended (8457)

Grauw's picture

12-10-2015, 00:55

Louthrax wrote:

Reading long file names from disk would be something to add to Nextor. Maybe not a full support, just retrieving the long file name from an opened file or in the find first / find next functions. Doing that outside of the Nextor kernel requires to parse the "FAT16 structure", definitively too hard.

Let’s hope konamiman’s listening Smile.

Louthrax wrote:

About the VGZ files: in the zip file format, I think there's room for variable-length "comments" or meta-data for each file. Not sure if gzip handles that. Of course those comments are lost if you uncompress it with gzip, but that could be an idea (duplicating the info you need there ?).

Yeah it could be, but I’m working with pre-made files here so if the information isn’t already there… Once I make a file browsing and song queueing type of UI, it’s probably best to work with a cache of sorts. Or just show the long filenames, and only show the tag data for the currently playing song, once it finishes decompressing.

Louthrax wrote:

Talking about Bitbuster, it's using a simple LZSS/LZ77 compression with variable length and byte-aligned encoding. […] The Z80 unpacking routine is amazingly short and fast […] Also, Bitbuster does not use extra memory, except for an "output buffer" block.

Good to know, I figured it skipped the Huffman coding, it’s inefficient to process. Gzip’s Deflate doesn’t use too much memory either, the LZ decoding needs a 32K lookback but that’s not really a concern if you’re loading into memory, and when writing to disk it’s also a size that fits within the Z80’s addressable space. But I do need some memory to store the huffman structures, and the code is bulkier as well of course.

Quote:
  • PMEXT.COM: compressed file size: 247,424 bytes, decompression time: 1min58s
  • SP.COM: compressed file size: 272,851 bytes, decompression time: 8s

PMEXT could certainly be speed-up by not using CP/M functions and better buffering, but the speed-difference will remain significative I think.

8 seconds, really nice. The majority of that must be spent on disk I/O, just a fraction on decompression.

Quote:

Do you have an idea of the max bitrate needed for VGMPlay ?

Depends on the song, but if you disregard songs with sample data, looking at two of the worst cases in terms of uncompressed file size and playback CPU consumption, one is 106K (932K uncompressed) for 4:36, so that’s 0.38K / sec, and the other is 42K for 1:47 (566K uncompressed), that’s 0.39K / sec. Looks like PMEXT does 2.1K / sec. Of course I also need to spend CPU time on playback, but it does seem achievable. I’m generating my huffman tables as code which should be efficient, so I hope that will also help to achieve the bitrate.

By Grauw

Ascended (8457)

Grauw's picture

18-10-2015, 01:36

Louthrax wrote:

Some benchmarks with PMEXT.COM and SP.COM using Aleste 2 disk 1 on turboR:

  • PMEXT.COM: compressed file size: 247,424 bytes, decompression time: 1min58s
  • SP.COM: compressed file size: 272,851 bytes, decompression time: 8s

My gunzip implementation takes 26s. If I disable checksum validation, 20s. On Z80 it takes 133s.

It’s nearly ready to release, I’ll make a forum post about it soon. Edit: linky.

By Louthrax

Prophet (2084)

Louthrax's picture

18-10-2015, 19:03

I'm wondering about the interest of my new .SPA format now. Hopefully, everything is not lost in what I've coded for SofaPak. To be continued Smile...

By Grauw

Ascended (8457)

Grauw's picture

18-10-2015, 20:01

Oh, that was not my intention…

I think SofaPak has quite a significant speed advantage in loading time, especially on MSX2 but also on turboR. Why wait 2 minutes before you can play Ys or Metal Gear 2 when it could be one? Smile Also I looked at the BitBuster code and it’s really super short and elegant, incomparable to gunzip, so it may fit better within the SofaRun and other tools’ executable memory requirements.

By Louthrax

Prophet (2084)

Louthrax's picture

18-10-2015, 20:18

Plan for SofaPak / SofaRun is mainly to support long file names, in an easy way. That's what I have in mind:

  • Open a new browser with long file names when a .SPA or .ZIP file is selected in SofaRun.
  • When a ROM / disk image / whatever is selected to be run, silently extract it to a file named with the hex. CRC32 (which is already stored in the .SPA or .ZIP !). Do this only if this file does not already exist.
  • Run the CRC32-named ROM / disk image.

That approach requires a temporary directory for extracted files. We should have enough space on SD cards for that, and would not need to clean it too frequently.

So I was just needing a "decent" speed on a normal Z80 (for the 1st time extraction) and that's the case with gunzip. The main advantage of using zip format instead of new .spa is that everybody could create its own game compilation easily on Win/Mac/Linuxes using their favorite GUI or command line zipper, and that's a huge advantage. Also, I have a working version of SofaPak for Win32, but I realize the code that handles file scanning, subdirectories, file date, etc... is not so easy to port to all platforms.

By Grauw

Ascended (8457)

Grauw's picture

18-10-2015, 20:22

Sounds pretty awesome Smile.