R800 DMA...

Page 1/5
| 2 | 3 | 4 | 5

By PingPong

Prophet (3233)

PingPong's picture

04-05-2019, 20:09

I've read about R800 DMA capabilities, so i'm wonder how this is supposed to work in MSX TR.
Generally speaking a DMA is a request from an external device that performs high speed data transfer.
During the transfer the CPU is in high impedance state and does not drive the bus. Pratically the cpu is kicked off.

So i'm a bit surprised to see a cpu with DMA caps. In my mind a dma on msx could be between VDP and CPU that is the VDP performs a DMA request and CPU acknowledge for the duration of the operations. This is how works for example the z80 dma controller in conjunction with z80.

Anyone can explain how should work on msx? what's the purpose? V9990 - R800 comunication for example?

Login or register to post comments

By Sandy Brand

Master (141)

Sandy Brand's picture

04-05-2019, 21:43

There are many ways to implement DMA controllers.

I think the very early DMA controllers were basically 'cycle stealing' on specific flanks of the CPU timer signals, to access RAM while the CPU was busy doing some internal computations and/or processing of data in a moment where its address bus is stable. It is probably quite similar to how 'memory mapped I/O' used to work for hardware architectures that had VRAM slotted into the normal address range of the CPU (so both the VDP and CPU are contending for the same RAM).

Yes, something like DMA between RAM and VRAM would make the most sense, but it could also just be from RAM to RAM (basically a faster version of LDIR that can run in parallel while the CPU is doing something else). It all depends on what the functionality is that you need, and what your hardware looks like (e.g.: on MSX RAM only has an addressable space of 16 bits, whereas the VDP is 17 bits (and probably more on a V9990?), so it is not clear what the CPU should support for for this out of the box?)

Anyways, I am not a hardware expert by any means Smile

By ducasp

Rookie (23)

ducasp's picture

05-05-2019, 01:11

Even though R800 supposedly has DMA support, the only computer using it doesn't have the pins to drive it connected to any device or bus... So, does it work? Maybe it doesn't and it is one reason it wasn't used... Perhaps it was going to be used with v9978, but as it didn't see the day light... The cartridge port was kept the same and no extension to it was created... So, it doesn't work on TR, it only adds to the "half baked" feeling about TR.

By NYYRIKKI

Enlighted (5256)

NYYRIKKI's picture

05-05-2019, 08:22

PingPong wrote:

Anyone can explain how should work on msx? what's the purpose? V9990 - R800 comunication for example?

As it newer happened, we can only speculate, but I believe this could be plausible explanation:
- You use FDC or something to switch I/O ports to DMA controller (bring down CSREG)
- You use OUT-commands to set up 24-bit destination address to R800 DMA controller
- You bring CSREG up again
- You send SX,SY,NX,NY etc. to V9978 and send transfer command.
(or other way around)

I don't have a reason to believe DMA feature on R800 is broken... It is documented and all... I believe it was newer used just because Yamaha failed to deliver. On MSX I see no much use for DMA other than VDP. Maybe other channel was reserved for mass storage for future expansion.

By PingPong

Prophet (3233)

PingPong's picture

05-05-2019, 23:48

maybe the original plan for TurboR was more ambitious: a v9990+v9958 core plus a DMA CPU support.
It is a pity that had not happened...
the R800 DMA is a sign like the V9990+V9958 => V9978 that they planned something really powerful able to compete with 16 bit machines... The first 16 bit msx

By NYYRIKKI

Enlighted (5256)

NYYRIKKI's picture

06-05-2019, 07:41

PingPong wrote:

maybe the original plan for TurboR was more ambitious: a v9990+v9958 core plus a DMA CPU support.
It is a pity that had not happened...

Now thinking about it, I would be pretty surprised if they didn't consider DMA support... They wanted to have big boost to CPU and a new VDP with more colors, bigger resolution & bigger memory... They must have been also aware that VRAM access trough I/O ports had been criticized bottleneck for MSX standard from day one... 1+1=2

By PingPong

Prophet (3233)

PingPong's picture

06-05-2019, 13:10

I see the I/O port based VRAM access more as a psychological limit than an effective one. It is relatively rare the need to perform random vram access and sequeltial block access can be faster (sometimes) than a cpu RAM access due to the ability to avoid increment the target ptr (saving cycles)
Just remember that the MSX with a z80 and a VDP I/O port based is able to push in the worst case about 800 bytes of data during vertical retrace period. Some c64 commodore users have a lot of trouble pushing 1000 bytes of color ram. this take a lot more time than on msx to push 800 bytes.

Obviusly an msx should be coded like an msx not like a ZX Spectrum that has the availability of VRAM under cpu control and force a specific way to perform screen updates not working so well on msx.

Apart from this, i do not know if TurboR was a ASCII initiative dropped after a bit of time or if it was a Full Panasonic idea. To me it does appear they worked in the right direction removing the legacy and limiting "features" of the early msx'es

They planned:
1) DMA support (for me targeted to VRAM<->RAM transfers)
2) V9990 (see 1)
3) A more flexible memory management model (24 bit addressing)
4) a more speedy cpu with the compatibility of z80 machine code.

their design is clear to me:
1) Remove limitations
- slow VDP (V9990 was also port based but there is a open door in direct VRAM access AFAIK, thus DMA)
- the plethora of mapper, slot and any other kind of ceremony
2) Go on 16 bit era
- V9990 [meaning as a graphic capabilities more like a 16 bit hw]
- Fast Z80 with a 16 bit alu and 24 bit addressing mode.

unfortunately the project did not have a lot of success, otherwise it could have been a reborn of a 16 bit powerful msx with a lot of early limitations removed.

By NYYRIKKI

Enlighted (5256)

NYYRIKKI's picture

06-05-2019, 16:07

PingPong wrote:

Apart from this, i do not know if TurboR was a ASCII initiative dropped after a bit of time or if it was a Full Panasonic idea.

Earlier I was wondering this as well, but Aucnet NIA made it clear to me it was pure ASCII design... AFAIK this computer has nothing to do with Panasonic and yet it is 98% MSX tR as we know it... ie. in system software the only differences are few added jumps and PCMREC-command broken cause they used wrong compiler. It seems to me Panasonic pretty much just took the reference design, added their own firmware menu and put it on sale... Just like they did with their MSX2 and MSX2+

By PingPong

Prophet (3233)

PingPong's picture

06-05-2019, 19:36

I've thought this because the only company that done a MSX TR was the panasonic and its name has never been a more logical name like MSX 3 so in my mind it was only a tweak of msx2+ standard made by panasonic.

By Sandy Brand

Master (141)

Sandy Brand's picture

06-05-2019, 22:20

PingPong wrote:

I see the I/O port based VRAM access more as a psychological limit than an effective one. It is relatively rare the need to perform random vram access and sequeltial block access can be faster (sometimes) than a cpu RAM access due to the ability to avoid increment the target ptr (saving cycles)

Good for writing big blobs of data for your 'office' software maybe. But a real pain for games that have to manipulate sprite attributes or want to do something interesting with line interrupts, for example. And the later V9938 with its wacky way of storing sprite colors somewhere else in VRAM makes it even worse.

You will have to implement little bits and pieces of overhead everywhere: keeping memory copies of VDP registers, having to disable interrupts because you need to make sure nothing accidentally start writing to the same ports as you are, having to always 'disassemble' 17 bits VRAM address in order to write it to the VDP ports, etc. All these things start to add up and make a big difference, not to mention it makes everything a hassle.

Just look at what has been achieved by the game-dev and demoscene on a C64; it is quite clear that their hardware config allows for a lot more cool stuff.

Now, the thing is, that MSX was not designed for this. It was designed with modularity and extendability in mind; thus it made sense to isolate specific pieces of hardware behind some I/O ports and providing clean interfaces through some BIOS functions so that manufacturers have a bit of wiggle room on how to implemented the standard.

I think the MSX concept was maybe a bit too far ahead of its time. They were trying to go for modularity in a time whereby hardware was still so limited that it was maybe hard to justify the required overhead to make it work.

By PingPong

Prophet (3233)

PingPong's picture

07-05-2019, 02:17

Sandy Brand][quote=PingPong wrote:

I see the I/O port based VRAM access more as a psychological limit than an effective one. It is relatively rare the need to perform random vram access and sequeltial block access can be faster (sometimes) than a cpu RAM access due to the ability to avoid increment the target ptr (saving cycles)

Quote:

Good for writing big blobs of data for your 'office' software maybe. But a real pain for games that have to manipulate sprite attributes or want to do something interesting with line interrupts, for example. And the later V9938 with its wacky way of storing sprite colors somewhere else in VRAM makes it even worse.

You are not seeing the real problem: this is not due to I/O port based access. It is mainly because the vdp store sprite color informations linked to plane number instead of a more logical way related to pattern number.
this force you to manipulate 16*32 bytes of VRAM instead of writing only a different pattern no into sat.

Even with random access you will be forced to do the same manipulation and probably you will end up to have a separate RAM buffer in order to perform the same operation without incurring in tearing or any other sort of glitches, so your probably end up using an LDIR instead of OTIR instruction. With VRAM access you pay the initial setup of VRAM ptr but this is negligible power compared to the time you take to do a full 512 bytes block move.
Even if you perform a randomized plane access by plane no (in order to touch only some sprite planes instead of full sat) the overhead is moderate, because to change a msx2 sprite color you need 16 out operations with a overhead of some 2/4 out to set vram ptr. again it does not change too much .

Quote:

You will have to implement little bits and pieces of overhead everywhere: keeping memory copies of VDP registers, having to disable interrupts because you need to make sure nothing accidentally start writing to the same ports as you are,

memory copies of vdp register will be necessary even with a memory mapped schema instead of port based I/O.
remember registers are not memory location. they are not required to behave like standard memory locations, it is common that they are write or read only. the c64 is another example of this. there are some VIC-II registers that when written modify some aspect of the display and when you read returns another one. (the scanline interrupt for example)
Nothing related to I/O or memory mapped scheme only to nature of hw registers.
plus disabling interrupts is not required unless you write data in an interrupt handler. the msx int 38h simply read data from a register and the vram ptr is not touched by this operation.
again is not strictly a limit of I/O or memory mapped approach. is instead a limit of the vdp. It does not provide a way to read the VRAM ptr. Basically with memory access schema if you use HL to point to VRAM both in main program and interrupt service routine you MUST SAVE HL and restore on exit otherwise you mess things a lot.
With I/O schema instead of a PUSH HL POP HL you need a way to get the current VRAM Ptr from VDP internal status to restore on exit of the interrupt. Unfortunately there is no way to do this. But it is not due to I/O based access, only to a flaw in VDP design.

Quote:

having to always 'disassemble' 17 bits VRAM address in order to write it to the VDP ports, etc. All these things start to add up and make a big difference, not to mention it makes everything a hassle.

I agree. this is the most stupid way to set VRAM ptr address, but again not due to I/O approach, only to the stupid way is was done. There are more smart ways things could have been done. Just remember,to set vram ptr the vdp need basically two operations: 1) out of the address, 2) strobe the read or write

the error was here: using the msb of the address byte to set a write or read operation. this force the cpu to manipulate a bit by merging an address byte with the write or read operation. CPUs are not efficient when it comes to manipulate a single bit. So a waste of time compared to the size of data you manipulate (1 bit).
One solution: different port addresses to perform write or read , when setting the vram ptr and voila the overhead of setting vram ptr was gone. instead of:

ld a, l
out (0x99),a
ld a, h
or 0x40
out (0x99),a

became
ld c, PORT_FOR_WRITE
out (C),l
out (C),h

or
ld c, PORT_FOR_READ
out (C),l
out (C),h

plus the 17 bit issue is a shitty thing that comes from TMS root and to the decision to merge the address bit with a read/write mode bit in the same byte. but there is nothing that at hw level force you to do things in that way. is was a stupid choice.

I think v9938 could had provided a better and clean way (see v9990) to set vram ptr instead of creating the stupid register to select the bank of 16K where to operate and thus requiring two extra out to set the full 17 bit ptr.
But again not a I/O issue only a way they choosed to made things.

Consider that VRAM access tend to be sequential by nature, in those situations the I/O vram ptr access does not make a huge difference.

The only thing when you notice a big bottleneck ( but it is rare ) is when you do a lot of random access to vram and for each access you only modify a single byte. (example a lot of PSET statements in random position, or the drawing of a line) In this situations performance clearly suffer because of the overhead of setting vram ptr.

but when you write some sequential bytes (for example 4-8 bytes) for each vram ptr setup the overhead became acceptable. And the more bigger the block you write the lighter is this overhead.

Page 1/5
| 2 | 3 | 4 | 5
My MSX profile