R800 clock cycles per instruction

ページ 1/2
| 2

By WORP3

Paladin (864)

WORP3 さんの画像

25-06-2011, 12:41

Has anyone an R800 instruction list that include the clock cycles per instruction ?
I need it to re-calculate some fm-pac access routines.

Also, does anyone have a ML routine that detect the cpu mode of the Turbo-r (Z80 / R800 mode) ?

Cheers,
Tjeerd.

ログイン/登録して投稿

By Edwin

Paragon (1182)

Edwin さんの画像

25-06-2011, 13:26

Instruction timings for R800 are included here on MAP. But note that it's quite as simple with z80.

As for cpu detection. In Wings I selected between z80/7MHz/R800 by calculation the a number of times a piece of code could execute between two vblank interrupts. I don't think there is another way.

By hit9918

Prophet (2927)

hit9918 さんの画像

25-06-2011, 14:59

Instruction timings for R800 are included here on MAP. But note that it's quite as simple with z80.

It is a speed monster!
LD a,(hl) in 2 cycles, an unexpanded Amiga 1200 needs 6 cycles (12 at 14Mhz).

How many cycles is the port 98 brake?
Does the brake kick in on EVERY out, or only when a write already is outstanding?
In the latter case, after an outi one would have some dozen cycles time to do something else.

p.s.
the site lists a LD A,(IY+o) in 1 cycle, bug :P

By Edwin

Paragon (1182)

Edwin さんの画像

25-06-2011, 15:25

It is a speed monster!
LD a,(hl) in 2 cycles, an unexpanded Amiga 1200 needs 6 cycles (12 at 14Mhz).

I'm more partial to add hl,de myself. Wink

How many cycles is the port 98 brake?
Does the brake kick in on EVERY out, or only when a write already is outstanding?

Delays every out to the vdp if the previous out was a number cycles before it. How many that was exactly I don't remember, but it's a great deal more than necessary and therefore slower than z80 mode if you were to out at max speed.

the site lists a LD A,(IY+o) in 1 cycle, bug Tongue

Likely. I think the one above it is right.

By PingPong

Prophet (4093)

PingPong さんの画像

25-06-2011, 15:35

By hit9918

Prophet (2927)

hit9918 さんの画像

25-06-2011, 16:45

If cpu switch can be done in a couple of rasterlines, then one can use z80 as the outi dumper?

I just got this idea: The R800 prepares multiple OUT-jobs for z80 to minimize cpu switches. The commandbuffer:

codeaddress, hl, bc, codeaddress, hl, bc ...

Load the commandbuffer to z80 stackpointer. The thing starts with the z80 doing a RET.
So now it is at the codeaddress. It could be the "copy8m" code, copy a multiple of 8 bytes:

copy8m:
	pop hl ;address
	pop bc ;amount of bytes and the port
loop:
	outi
	outi
	outi
	outi
	outi
	outi
	outi
	outi
	jp nz,loop
	RET		;fetch next codeaddress from commandbuffer and jump to it

another example commandbuffer op: set vram address

setwrt:
	ld c,0x99
	pop hl
	out (c),l
	out (c),h ;setting some bit for write mode is the job of R800.
	RET

By WORP3

Paladin (864)

WORP3 さんの画像

25-06-2011, 19:38

Uhhh, has anyone an example on how to detect if a msx is running in the turbo mode ?

By WORP3

Paladin (864)

WORP3 さんの画像

25-06-2011, 19:55

Instruction timings for R800 are included here on MAP. But note that it's quite as simple with z80.

As for cpu detection. In Wings I selected between z80/7MHz/R800 by calculation the a number of times a piece of code could execute between two vblank interrupts. I don't think there is another way.

Thaks for the link and the time calculation idea !

By wouter_

Champion (508)

wouter_ さんの画像

25-06-2011, 20:08

LD a,(hl) in 2 cycles ...
Most (all?) R800 timing tables indeed list 2 cycles for a LD A.(HL) instruction. But that doesn't count the penalty cycles for switching between opcode fetching and data read/writes. if you actually measure the speed of this instruction, you'll see that it takes 4 cycles!
- 1 (or 2) cycles to fetch the instruction opcode
- 2 cycles to read a data byte from memory (1 for the actual read + 1 penalty cycle because we switched from opcode fetches to data reads)
- next opcode fetch will also take 1 cycle extra (because we switch from data read to opcode fetch)
So in total it's 4 cycles .. depending on exactly how you want to count. Though in any case, if you execute N times a 'LD A,(HL)' instruction it takes 4xN cycles to execute (ignoring the other penalty effects).

For details, download the openMSX source code and read the document doc/r800test.txt. The other files in that directory that start with 'r800' may also be interesting.

How many cycles is the port 98 brake?
Does the brake kick in on EVERY out, or only when a write already is outstanding?

I did some measurements some time ago (see doc/turbor-vdp-io-timing.ods in the openMSX source code). On a turboR (R800 mode), the time from VDP-IO to VDP-IO is at least 62 clock cycles. The 'IN A,(xx)' or 'OUT (xx),A' instruction itself takes 10 cycles. So there's still room for 52(!) extra cycles before you start seeing an actual extra delay (at least approximately, see that document for more details).

By NYYRIKKI

Enlighted (6016)

NYYRIKKI さんの画像

25-06-2011, 22:16

Uhhh, has anyone an example on how to detect if a msx is running in the turbo mode ?


Well... My top 3 ways that I've used:
- BIOS routine in #183
- Direct CPU mode check trough I/O #E5 & #E6
- Executing MULTUB and comparing the results.

By hit9918

Prophet (2927)

hit9918 さんの画像

25-06-2011, 22:28


Most (all?) R800 timing tables indeed list 2 cycles for a LD A.(HL) instruction. But that doesn't count the penalty cycles for switching between opcode fetching and data read/writes.

This reminds me what I meanwhile have read about 256 byte DRAM pages, typically code is in a different page, I guess that is meant with "switching". Means with code and data in same page, ld a,(hl) should go in 2 cycles.


So there's still room for 52(!) extra cycles before you start seeing an
actual extra delay (at least approximately, see that document for more details).

It is cool that you measured all that!

I did such use of extra cycles in the charloader of smoothscroller. Between the outi instructions fetch/add the address of the next tile.

ページ 1/2
| 2