GFX9000 Faster but not fast...

Page 1/10
| 2 | 3 | 4 | 5 | 6

By GhostwriterP

Hero (626)

GhostwriterP's picture

04-11-2005, 16:45

Whoooo here comes a lot text your way!

Well let's start with saying that compared with a v9958 the v9990 is without doubt
faster. But how much faster? Well to figure that out, I ran a few small tests concernig
the copy command LMMM. Offcourse only on a few different display settings since
the test is more or less targeted on my "The Revenge of The Last Dragon" project.

Second, is for those who compare v9990 with SNES or Genesis (Megadrive). I have
to say that those video processors have all kinds of handy features wich makes
it very hard to compare to either one of them. Surely I believe it is possible to make
something like sonic, but I get back on that later.

Third, multilayer scrolling is other than using (obviously) P1 not possible fullscreen
at a 60 Hz framerate. For those who believe different, do not hesitate to prove me
wrong! 3 Layer scrolling in P1 deffinitely not possible, why? Just read on...
(ps also not on 50 Hz and no parralax techniques just simply multi-directional-
independent scrolling).

Fourth, well lets just say that people have the idea that GFX9000 has to look better
than MSX2, so lets go 8-bit colordepth display resolution of 512x424, result is that
you have to copy 8 times as much (4x for the surface and 2 times for the colordepth).
Question, can the vdp handle that much data? If an original game runs in screen 8 with
a lot off tpsets in it, it might verywell be possible (I am actualy sure of it). But please
keep in mind that copying an entire screen in one interupt is, as far as I know, a fable.

I know I know... I am a bit negative but, now follow a few tables with the test results.
Might be usefull for all off you who are thinking about making a GFX9000 application.
The test: How many 16x16 tiles can be copied in one int? The table shows colordepth,
number of 16x16 tiles and followed by the number of bytes of the LMMM copy command.

 
B1 60 Hz

color  | 16x16 | bytes
----------------------
8-bit  |   77  | 19712
4-bit  |  106  | 13568

B1 50 Hz

color  | 16x16 | bytes
----------------------
8-bit  |   94  | 24064
4-bit  |  127  | 16256

B1 60 Hz Overscan (192x212)

color  | 16x16 | bytes
----------------------
8-bit  |   58  | 14848
4-bit  |   85  | 10880

Now a few things that I like to point out. Obviously overscan eats away some data
transfer time. But more interesting the amount off bytes in 8-bit mode is larger than
at a 4-bit mode however, in case of copied surface the 4-bit comes out as the winner.
Now there is a good possibility that those numbers are higher when it is done with one
single command (I waste some time on filling the regsSmile. But I think it provides a good
and practical indication.

So 20 kbyte in a simmular screen as sc8 is a lot. It is about 5 times faster. I was happy
and excited until I tested it on a 4-bit mode... the darn thing seems to lose some time
on that nyble (dots) thing, and copies 7 kb less. But it is still a bigger area and suited
my needs more than enough... at least so I thought.

Lets come back on the P1 mode. 2 Leyers 125 sprites 256 sprite patterns 4x16 colors.
Perfect for games, and yes almost like the Genesis. But now the reason why i am so
sad/negative/disapointed Crying It appeared that 256 patterns wasn't gonna be enough
all stuff needed to be animated and ther was no room for all those frames in SGEN.
So i thought copying frames to be a perfect solution, and i am not talking about every
small independent thing, but just the main charracters since they have the most frames.
All easy within the 13 kbyte, but in a P1 mode the memory is interleved and this causes
the number off bytes that can be transfered to drop once again, and in combination
with the sprites drop to... just see for yourselves.

P1 60 Hz 

command| 16x16 | bytes
----------------------
LMMM   |   34  |  4352
BMXL   |   42  |  5376

Almost like you are in screen5 (on 50 Hz 192 lines). I am not happy... But at least
it is stil faster than v9958 even if it is not by far.
And do not forget about the 2 layers 125 sprites, after all it can't be copied either.

Oh... right... BMXL seams to be a bit faster. Not much especialy in the B1 modes namely
varying between 1 to 4 tiles more, but here a entire kilobyte ^_^ so just ignore all the
crazy talk and use those tables (or not) before starting a project thats not doable.
But something like sonic is still posible if you just keep low on the 'special' effects wich
are for granted on other (game) systems.

I feel a lot better know this is off my chest, and I do hope a part of this post is usefull for
some. And offcourse if I am wrong I would like to be corrected. You know, get some
feedback on the matter never hurts.

Login or register to post comments

By msd

Paragon (1462)

msd's picture

04-11-2005, 18:49

Can I see the test code?

By ARTRAG

Enlighted (6567)

ARTRAG's picture

04-11-2005, 19:04

I do not know v9990 very well, could you present similar
figures for the v9938/58 in order to have a direct comparation?

What about "The Revenge of The Last Dragon" for standard msx2 ?
In TotalParody I started with a omnidirectional scroll with 2 layers in scr5 @50Hz
it isn't undoable even if very tricky.

By msd

Paragon (1462)

msd's picture

04-11-2005, 19:20

Did you test it on a turbo r in r800 mode?

By GhostwriterP

Hero (626)

GhostwriterP's picture

05-11-2005, 12:09

Besides the in game test i used the following code.
And i tested it on 3.5 MHz and 7MHz and surely 7 MHz is a bit faster. So r800 will be a little
faster too, but i wanted to know what a Z80 can do. Wich answers ARTRAG's question I
did intend to make the game for standard msx2 with moonsound and gfx9000. Now I have
my doubts, so it is probably gonna be a turbor game (yes i am not giving up).
anyway the code:

  org 100h

  module main

  xor a
;  inc a	;overscan
  out (67h),a

  ld a,6
  out (64h),a
  ld a,10000010b	;6  8-bit cl
;  ld a,10000001b	;6  4-bit cl
;  ld a,00000101b	;6  P1
  out (63h),a
  ld a,00000000b	;7	pal 1000b
  out (63h),a
  ld a,10000000b	;8	;
  out (63h),a
  ld a,0
  out (63h),a		;9
  out (63h),a		;10
  out (63h),a		;11
  out (63h),a		;12
  ld a,00000000b	;13
  out (63h),a
  xor a			;14
  out (63h),a
  inc a
  out (63h),a		;15 backdrop
  dec a
  out (63h),a		;adjust
  out (63h),a		;scroll spul
  out (63h),a
  out (63h),a
  out (63h),a

  xor a
  out (64h),a	; register select
  out (63h),a
  out (63h),a
  out (63h),a

  halt
  halt
  ei
/*
  ld hl,gfxnaam		; Gfx laden
  call df.BuildFCB
  call df.OpenFile

  call readOut16k
  call readOut16k
  call readOut16k
  call readOut16k
  call readOutColor

  call df.CloseFile
*/
  ei
  ld b,200
wachtff
  halt
  djnz wachtff

  di


1 in a,(65h)	;gfx9000 sync
  and 64
  jr nz,1b
1 in a,(65h)
  and 64
  jr z,1b

  call CopyBlankPeriod
  call CopyDisplayPeriod

  ei

  ld b,250
wachtff2
  halt
  djnz wachtff2

exit
  ld hl,(COPIES)
  ld (loWord),hl
  call printaantal
  xor a
  ld ix,0d1h
  ld iy,(0faf7h)
  call 1ch
  ld ix,141h
  ld iy,(0faf7h)
  call 1ch
  ld ix,156h
  ld iy,(0fcc0h)
  call 1ch
  ld de,Txthoi
  ld c,9
  call 5
  ret

printaantal
  ld ix,Txthoi

  ld hl,hextab
  ld d,0
  ld a,(loWord+1)
  srl a
  srl a
  srl a
  srl a
  ld e,a
  add hl,de
  ld a,(hl)
  ld (ix+0),a

  ld hl,hextab
  ld a,(loWord+1)
  and 15
  ld e,a
  add hl,de
  ld a,(hl)
  ld (ix+1),a

  ld hl,hextab
  ld a,(loWord)
  srl a
  srl a
  srl a
  srl a
  ld e,a
  add hl,de
  ld a,(hl)
  ld (ix+2),a

  ld hl,hextab
  ld a,(loWord)
  and 15
  ld e,a
  add hl,de
  ld a,(hl)
  ld (ix+3),a

  ret

CopyBlankPeriod
  in a,(65h)
  and 1
  jr nz,CopyBlankPeriod

  ld a,32
  ld bc,16*256+63h
  out (64h),a
  xor a
  out (63h),a	;32	source x
  out (63h),a	;33
  out (63h),a	;34	source y
  out (63h),a	;35

  out (c),b	;36	destination x
  out (63h),a	;37
  out (c),b	;38	destination y
  out (63h),a	;39

  out (c),b	;40	number of dots/bytes x
  out (63h),a	;41
  out (c),b	;42	number of dots/bytes y
  out (63h),a	;43

  out (63h),a	;44 
  ld a,11100b		;tpset     pset 01100b
  out (63h),a	;45	;log op 

  ld a,255
  out (63h),a	;46	;write mask 
  out (63h),a	;47 

  ld a,52
  out (64h),a
  ld a,01000000b	;LMMM

;  ld a,10000000b	;BMXL


  out (63h),a

  ld hl,(COPIES)
  inc hl
  ld (COPIES),hl

  in a,(65h)
  and 64
  ret z
  jp CopyBlankPeriod

CopyDisplayPeriod
  in a,(65h)
  and 1
  jr nz,CopyDisplayPeriod

  ld a,32
  ld bc,16*256+63h
  out (64h),a
  xor a
  out (63h),a	;32	source x
  out (63h),a	;33
  out (63h),a	;34	source y
  out (63h),a	;35

  out (c),b	;36	destination x
  out (63h),a	;37
  out (c),b	;38	destination y
  out (63h),a	;39

  out (c),b	;40	N x
  out (63h),a	;41
  out (c),b	;42	N y
  out (63h),a	;43

  out (63h),a	;44 
  ld a,11100b
  out (63h),a	;45	;log op 

  ld a,255
  out (63h),a	;46	;write mask 
  out (63h),a	;47 

  ld a,52
  out (64h),a
  ld a,01000000b

;  ld a,10000000b	;BMXL

  out (63h),a

  ld hl,(COPIES)
  inc hl
  ld (COPIES),hl

  in a,(65h)
  and 64
  ret nz
  jp CopyDisplayPeriod

/*
readOut16k
  ld hl,16384
  ld de,8000h
  call df.ReadFile
  ld a,64
  ld bc,60h
  ld hl,8000h
1 otir
  dec a
  jr nz,1b
  ret

readOutColor
  ld hl,48*3
  ld de,kleurtabel
  call df.ReadFile

  ld a,2
  ld c,61h
  ld hl,kleurtabel
2 ld b,128		;kleuren doorsturen
1 outi
  outi
  outi
;  out (c),0
  djnz 1b
  dec a
  jr nz,2b
  ret
*/


kleurtabel
  block 192

gfxnaam
  byte "TESTPLT1C64"

hextab
  byte 48,49,50,51,52,53,54,55,56,57,97,98,99,100,101,102  

Txthoi
  byte "0000$"

loWord
  word 0

COPIES
  word 0,0,0,0,0,0,0,0

  endmodule

;  include dskio.i

  end

By ro

Scribe (4532)

ro's picture

05-11-2005, 12:29

you're italian?

so uhrm, let's dump that whole gfx module and stop wasting time. we've gotta get with the program and do some standard msx(2) stuff again. enough with the extentions. psg for ever. screen5's pretty fast if you do some good coding. ooh common!

(nice article btw. thanx)

By msd

Paragon (1462)

msd's picture

05-11-2005, 12:31

: You only need to write this once
out (63h),a ;44
ld a,11100b ;tpset pset 01100b
out (63h),a ;45 ;log op

ld a,255
out (63h),a ;46 ;write mask
out (63h),a ;47

Not for every command again

By GhostwriterP

Hero (626)

GhostwriterP's picture

05-11-2005, 12:51

I thought it might not be needed but it is a test and I just prefer to make a test
a bit slower so it is a better indication in real live programs. But it is defenitly
a bit faster (a lot if you have a lot of copiesSmile).

@ro: Everything is already done for msx2. I prefer something thats not done
before or i have not done before. So I stick to either msx or gfx9000 for now.

By ARTRAG

Enlighted (6567)

ARTRAG's picture

05-11-2005, 13:03

@GhostwriterP
I am not so sure about the fact that everything has been done...
Look at the scr4 command topic for example, or at my scr5 platform with fine 2 layer scroll...
I could say that those exaples are only ideas that wait for being developed
in a true project...

By GhostwriterP

Hero (626)

GhostwriterP's picture

05-11-2005, 13:27

True, but I am not gonna be the one to develop that.
I am busy enough I would say (more than 3 projects) , and than there is vscreen
for those platform games, based on a already proven concept. I am afraid it is not
that appealing to me. But I like your effort though.

By Maggoo

Paragon (1214)

Maggoo's picture

05-11-2005, 14:55

@GhosWriter: I haven't used a V9990 in a looong time but I remember experimenting on it back in the good old days. That VDP is pretty fast (for copy commands and the like). What kind of speed issues or effect would you like to do that you think the V9990 can't do ? When it comes to scrollings and sprites, I think it's pretty close to what a Megadrive or a NEC PC Engine.

On a side note, I did my testing using a Turbo R which was a big plus. With the V9938/58, the VDP commands are kinda slow, which results in the Z80 waiting a lot. With the V9990, it was the opposite, the Z80 was slow compared to the VDP and R800 was improving things a lot. Anyway, if you plan on using the moonsound, you'll definetely need a R800 as the replayers are cosuming quiet some CPU time.

Page 1/10
| 2 | 3 | 4 | 5 | 6