VDP... and some really deep stuff. :)

Page 2/2
1 |

By salutte

Master (137)

salutte's picture

14-09-2021, 07:46

This is seriously cool! I had a student working in a system to squash machine learning models, and he had quite nice results. Our resulting model was so compact that the last two years I played with the idea of porting it to the MSX (and do image classification @ 1 image x week approx). But using the VDP would be awesome! I'll check the requirements, and see if I can make it work with only 8bit x 8bit multiplications.

(Really, I should finish some projects before starting new ones!)

By Grauw

Ascended (10021)

Grauw's picture

14-09-2021, 14:37

Mining bitcoins with the VDP? Big smile

By NYYRIKKI

Enlighted (5847)

NYYRIKKI's picture

16-09-2021, 10:30

I feel that this is getting a bit out of hand, so maybe it is better to explain how this black magic really works. Smile

So... Lets take a bit closer look of this 8bit multiply proram.

I could be talking about bits, but as our "bits" are actually pictures, I rather call them "areas" as I consider them as whole memory areas that have just lots of individual bits that are all handled exactly the same way compared to their counterparts on same area.

So... Using variables I have reserved some areas from screen to implement our "VDP processor"... we have first 8-bit input number AX(7) & AY(7), second 8-bit input number BX(7) & BY(7), we have the 16-bit result RX(15),RY(15) and common carry area CX & CY. Apart from these basic building blocks we have also mask area MX & MY and two temporary storage areas FX & FY and GX & GY

If we look what the program does lines 280-310 are optimized first iteration... At very start we calculate (A AND 1)*B that results either B or 0. On VDP we can't do "IF A AND 1 THEN R=B ELSE R=0" approach because these are areas and VDP does not have "IF". This is why we do AND A(0) against all bits of B. Result bits 8-15 are just cleared. (Sorry, for performance I have used fixed assumptions of where the areas are located.) Later on the program we use the mentioned mask area to store this A(x) AND B(x) result so that we cause number to be added to be 0 in case we don't really want to add it. This is why we always add mask to the result instead of B(x) Remember: We also need to add the 0, because other bits on the same area might have 1 as their value, so lacking of "IF" is not really an issue.

The lines 330-370 are optimized version of BIT0 of second iteration. In Z80 terms we are doing ADD A,B and not ADC A,B so we ignore the first input carry. As we use the mask instead of B the result is BIT0 XOR M. The carry will be BIT0 AND M. In electronics this is known as "half adder"

Lines 390-480 are the "full adder" that is a bit more complex. Yet again we calculate the mask (M), but to understand the full adder concept, it is better to take a look, what such looks like:

When you try to convert a circuit like this to VDP there are little differences. In electronics the operations happen all in parallel, but with VDP you need to consider the execution order so that you have all the needed input signals calculated. Also, when ever there is signal fork, you need to duplicate the area. How ever you also need to think to what area the result will be stored in order to minimize number of areas needed and to avoid useless moving of areas around that would cause unnessessary performance hit. For this purpose it is good idea to work out the picture from right to left and think it like "from what signals this result is calculated from" in order to pick up meaningful temporary storage areas.

In to this second picture I added the steps I selected to take:

... and that is about all that I have to say about this program. In Z80 you usually rotate the bits, but as we are talking about whole memory areas, it is faster to rotate the area pointers instead. If you feel that calculation of pointers is "cheating", you can unroll the loops to get rid of the calculations. I hope you got now better understanding of how this "processing on VDP" works.

Page 2/2
1 |