CTRL-STOP bug in the DOS2 environment

By Eugeny_Brychkov

Paladin (874)

Eugeny_Brychkov's picture

13-09-2017, 16:50

While fixing some bugs in my GR8NET DOS1 built-in implementation, I came across the exactly same issue with Nextor. Initially I though that it is issue of Nextor subsystem (because I use it in the device), but later I discovered that generic MSX-DOS2 also exhibits it. It may also happen that there's some "bad programming practice" comes to play, but as I did not see that defined as "bad" and I can not judge on it.

So, here're technical details.

When function 6 (DIRIO) of MSX-DOS is called with E=0ffh, it performs checks of the keyboard and returns characters if there're any have been typed. This system call is located in DOS2 ROM and works fine, but it has one exception - CTRL-STOP press event, when it does not give any typed character back to the application, and processes exception instead.

This exception is called "abort handler". When CTRL-STOP is pressed, execution is passed to the address located in the F325 (CTRLCAD) location. Thus system takes pointer from (F325), and jumps to it.

Now, there're two scenarios:
1. (F325) contains pointer to the MSX-Disk BASIC code when machine is in BASIC. The mechanism works fine, because this address pointed by F325 can be invoked only from Disk-ROM, thus there's Disk-ROM code is in CPU bank 1, and system jumps to correct address;
2. (F325) contains pointer to MSXDOS code in high memory area, after properly jumping to this area MSX-DOS starts moving some data back and forth throughout almost all the space of the MSX-DOS RAM.

Then issue happens. I was tracing Nestor's TCP/IP UNAPI's PING.COM application. It does the simple thing - it detects UNAPI location, and enables respective slot in the CPU bank 1 (4000-7FFF), and then calls UNAPI directly without performing far calls. For example, RAM is in slot 3-2, and UNAPI is in slot 1.

Thus, at the point when it is waiting for user input, I pressed CTRL-STOP, and major failure occurs. When abort handler starts, it leaves machine configuration with slot 1 switched on for CPU bank 1, and thus no DOS RAM there, and tries to copy some data and code in there and of course fails.

So what it is?

1. Bad application programming practice? Should PING.COM not turn slot 1 steadily in CPU bank 1, and use RST 30h instead every time - and if yes, then why this scenario is not allowed?

2. Bad design of DOS2 Disk-ROM and MSXDOS - they must switch RAM in all the CPU banks before trying to operate them.

I would like to hear your comments about this issue. Thank you.

Login or register to post comments

By NYYRIKKI

Enlighted (4743)

NYYRIKKI's picture

13-09-2017, 17:54

Hmm... I would not call this even "bad design of DOS2" but rather bug in DOS2... This view is mainly based to the DOS2 documentation of "DEFINE ABORT EXIT ROUTINE (63H)" This is not 100% same case since I don't think in this case the user defined abort routine was used, but when it is defined by user the documentation says "The user abort routine will be entered with the user stack active" blah, blah, blah "and with the whole TPA paged in." I would interpret that so that all of the banks should point to RAM in case of abort and I would expect that the default behavior should be the same as it is not other ways documented.

By Eugeny_Brychkov

Paladin (874)

Eugeny_Brychkov's picture

13-09-2017, 18:25

I traced the function and the same scenario I explained above, and "abort exit routine" address set by this function 63h does not have effect, problem occurs much earlier and in another part of the system code.

By NYYRIKKI

Enlighted (4743)

NYYRIKKI's picture

13-09-2017, 18:39

That might be, but as answer to your question I would still call it error since the end result is different from documentation.

Now that the cause of error is known I think there is no problem as I think it should be quite straight forward to overcome the problem by redefining the F325 before reading the keyboard... or to use BIOS routine instead.

By Grauw

Enlighted (6267)

Grauw's picture

13-09-2017, 18:59

When you switch slots or mapper segments, you must always switch them back to the original values before returning from the program. Even though the MSX-DOS2 environment reference says “A program [...] need not restore either the slot selections or the RAM paging before it exits, since COMMAND2.COM will handle this.”, this is not true. (Why, I do not know.)

So therefore if you call any BDOS routines which can be aborted by the user or system errors while the TPA is not in its default state, you must define an abort routine to handle this, and this routine must restore the pages to their original state before returning.

I do this in VGMPlay and the abort routine gets called when I press CTRL-C.

By NYYRIKKI

Enlighted (4743)

NYYRIKKI's picture

13-09-2017, 19:20

Grauw wrote:

[...] need not restore either the slot selections or the RAM paging before it exits, since COMMAND2.COM will handle this.”, this is not true. (Why, I do not know.)

Even as an idea this sounds pretty broken... The COMMAND2.COM is >16K, so how could you leave handling of slots to a program that can't get even loaded before the slot configuration is fixed?

My MSX profile