# VDP... and some really deep stuff. :)

Страница 1/2
| 2

First a little warning: This is pretty twisted coding stuff again.

It started when I was thinking about GPUs. Modern GPUs are real power houses that can lift off a lot of calculation work from the CPU. I was wondering that it would be really nice to have this kind of GPU on MSX that could do the hard work, but unfortunately on MSX2 we have only this super simple blitter that can't even calculate 1+1... but then I stopped for a moment to think about it a bit harder and figured out that I'm most likely wrong...

From this thought I ended up writing this stupid little BASIC program (click to view & run). It is horribly stupid & slow, but it was definitely a proof that my initial thought was wrong... The VDP can actually calculate 1+1 and even a lot more!... You just need to change the angle you look at the VDP as a programmer.

Act 2:

Next logical thought was that ok... Multiplying things is something that is slow even on Z80, so if the VDP was able to calculate 1+1, it should in theory be able to do multiplying as well, right?

Next I wrote this another test program... This very simple BASIC program multiplies 2bit value by another 2bit value and displays the 4bit result. It is not impressive in any way, but the point is that it is the VDP that does the actual multiply calculation. You can control the individual bits by using keys 0-3 and see easily how it is done. Instead of being impressive, it was more a proof of concept and mind experiment: If you can multiply 2bit values, you can then multiply also 4bit, 8bit or 16bit if you like...

Your are probably now thinking something like "Congratulations... You managed to find the most ridiculous, slow and non user-friendly way to multiply few bits together"... That is not a bad assessment, but this is still not 100% of the story... Although visually it looks like I multiply just few bits, technically I'm actually multiplying 16x16 pictures! As the example is written on SCREEN 8 with 8bits/pixel, this program is actually doing 2048 multiplications that could all have individual parameters... Ok, ok, the bits are still in funny, 90-degrees rotated order in slow VRAM. I can't really imagine a real world situation where this could be useful, but I kind of found this to be really fascinating, so...

Act 3:

I did throw up this really messed up, over complicated BASIC program to do 8bit * 8bit calculations... As most of the time CPU is anyway just waiting the VDP, I did not put any effort to optimize BASIC part of this test code...

At first the test results were pretty promising... For measurements I took a set of 6400 integer multiplications as my test set and from poor MSX-BASIC it took about 22.5 seconds to crawl trough the task while this other MSX-BASIC program using VDP to calculate them all in parallel took only 6.5 seconds...

I anyway knew that I was fooling my self, so I modified the program X-BASIC compatible to get more meaningful results and yeah... After that the VDP version took anymore 3 seconds, but the CPU version of the test was boosted to 2.5 seconds.

So, yes... No surprises here... it is not quite as fast method as using CPU and the results are hard to fetch and use. In theory this can almost double the MSX2 number crunching speed as both CPU and VDP can work individually at a same time, but even my twisted mind can't imagine a real word application where this approach could be used for anything even semi useful... I kind of knew the end result already at start, but it was anyway something that was interesting to try and I wanted to share these results with you for comments.

Для того, чтобы оставить комментарий, необходимо регистрация или !login

So, if I understand correctly, you are making mathematical operations on VRAM data by using the VDP blitter logical operators ... Truly amazing!

I'm also convinced it's not very useful, as setting a VDP command takes a minimum of 250 cycles, seeing that you need several of these to achieve the result. Whereas an unsigned 8bits/16 bits multiplication (for example) takes less than 300 cycles on the Z80. Unless (but I'm unsure it's the case here), you can multiply several numbers at the same time with the same blitter operation.

That's indeed really deep stuff !

I've got an 32 bit arduino hooked up to my joystick port which can do all sort of realtime calculations. Every time I would communicate with it, I can only send 2 bits at a time (receive 6 bits at a time). This makes I/O so slow, I rather just precalculate and store in tables instead of having a co-processor do it real time.

The VDP approach is kinda of the same, except using hardware that is already present. It will be slow anyway because of the I/O. I guess you came to the same conclusion. I am not sure if you were including I/O in the calculation time.

The only way this would work (I suppose), is having an arduino in a cartridge, so it can access all MSX I/O ports. You would be running the program in the arduino, but will be using the MSX for I/O. If there would be a way to have the MSX upload the code, it can function as the main processor for that piece of software, but you still can load it from disk.

Metalion wrote:

I'm also convinced it's not very useful, as setting a VDP command takes a minimum of 250 cycles, seeing that you need several of these to achieve the result. Whereas an unsigned 8bits/16 bits multiplication (for example) takes less than 300 cycles on the Z80. Unless (but I'm unsure it's the case here), you can multiply several numbers at the same time with the same blitter operation.

The time that it takes to set up the VDP is not that bad... if we use ie. this 250 cycles and the middle experiment that takes 14 commands this totals to 3500 cycles... So in case of 6400 multiplications it is less than a cycle / multiplication! Access times to read/write VRAM are a bit more of a consern, but the real horror from time perspective is anyway these "rotate 90" format conversions that need to be done to both directions to be Z80 compatible. (I call the problem "rotate 90" as it reminds me very much of a problem to rotate SCREEN 2 bitmap by 90°)

So if I understand correctly, in order to make those 6400 multiplications, you need to :
. transfer the 6400 first operands to the VRAM,
. transform the 6400 second operands (90 degrees transformation), then transfer them in VRAM,
. do the blitter commands.
Correct ?

Ha! Very interesting! Regardless of it's useful or not, I think it's a very cool experiment (and who knows! maybe someone finds a use! haha). Also, I wonder if this opens the door to finding for even further ways in which the VDP can make calculations (and whether different generations of VDP can do more or less calculations and at different speeds).

First steps of GPU accelerated machine learning in MSX.

Interesting! Very much the type of thing I was looking for in this thread I posted 13 years ago :).

Ah, checking that thread reminded me of something I always forget! Even on a first generation MSX the VDP can be used to complement the Z80 at least as a "slow access" additional memory buffer. If I don't remember wrong there is quite a bit of unused VRAM in the VDP in Screen 2, that can come in handy (and if not all sprites are used, then there is even more VRAM that can be used). And probably later generation VDPs have even more memory that can be used by the Z80.

Metalion wrote:

So if I understand correctly, in order to make those 6400 multiplications, you need to :
. transfer the 6400 first operands to the VRAM,
. transform the 6400 second operands (90 degrees transformation), then transfer them in VRAM,
. do the blitter commands.
Correct ?

Well not quite... You need to rotate everything before doing any calculations... and if you plan to use the data again on Z80 you have to rotate also the results. (Unless you want to do your whole program on VDP )

One more little remark: If this would be considered viable solution for anything then using SCREEN 6 would make much more sense from CPU point of view as it would need to divide the numbers only to 2 pools instead of 8. For VDP it makes no difference as the code is same. I think I made the examples on SCREEN 8 mostly because on MSX2+ you can do SCREEN 8 logic also on text modes.

Страница 1/2
| 2