*NYYRIKKI*wrote:

With values this means:

PRINT 8*14*((1/3579545*(16+1))/(253-135))

This gives me 4.5 us / Pixel

Similar calculation for border picture gives me 3.7 us / pixel

Is that correct?

Umh, do not know, in my tests i get 6.98us to plot a pixel (200,200)

I think your results are too much faster. with 3.7 us you get a teoretical rate of 270000 pixels/second. It's possible?

A 512x212 logical copy require almost 1 sec. there are 108000 pixels to process. maybe is correct?. a copy doubles (almost) the accesses compared to logical fill. I'm doing this comparison with line because i think the main bottleneck is the vram bandwidth, not the kind of command the hw is executing.

*PingPong*wrote:

*wouter_*wrote:

Euhm, the way i read the graph is that lines at 0 and 90 degrees are fastest and lines at 45 degrees are slowest (the speed of lines at all other angles lies between these two extremes).

I agree. Also i got the same reesults, 45° is the slowest

True, I did the math from pic and it seem this case gives 6us / pixel when sprites are on... (Still not sure if math is correct)

*NYYRIKKI*wrote:

Time to draw one pixel on horizontal line: (Real hardware, Red color 8 test points from "sprites on" picture: X=135, Y=100 & X=253, Y=100 Cycles with 14 colors in between: 8)

rotation cycles * colors in cycle * ((1 sec / clockspeed in Hz * (OUTI T-states + M1)) / (X2-X1))

With values this means:

PRINT 8*14*((1/3579545*(16+1))/(253-135))

I think you got the time of OUTI wrong. OUTI is 18 clock cycles (there are 2 extra M1 cycles in MSX for opcodes that start with 0xED). But doesn't your test use OTIR instead of OUTI? OTIR is 23 cycles per iteration. If I replace 16+1 with 23 in your calculation then I obtain numbers that approximate the cycle counts that are used in openMSX (within 5%-10%).

*NYYRIKKI*wrote:

It seems that current emulation thinks that the speed is same when drawin border or when sprites are off, but that is not true in real life.

I just checked: blueMSX indeed uses identical cycle counts for sprites-off and display-off (120,120,147). But openMSX uses (120,132,147). At least these ratios seem to approximately match the results from the real machine.

BTW: I literally added two lines to the openMSX source code and that gives me the following picture when I run your TEST2.BAS test program. This is very much work in progress (don't expect it yet in the upcoming 0.9.0 release), but at least it shows it's not too hard to get more accurate emulation results.

I just wrote similar test program in BASIC...

You should take the old pixel time you had and make new pixel time 2/3 of the original value. Then you should add X-step and Y-step pixel times that are 1/3 of the original pixel time. This gives very much same looking shape as the real computer.

Here is the quick and dirty code:

10 BLOAD"xbasic.bin",R 20 SCREEN 5 30 _TURBO ON 40 GOTO 70 50 GOSUB 230 60 RETURN 70 S=S+1:T=RND(-S) 80 SX=128:SY=100 90 DY=200:FOR DX=0 TO 254 100 GOSUB 50 110 NEXT DX 120 FOR DY=200 TO 1 STEP -1 130 GOSUB 50 140 NEXT DY 150 FOR DX=255 TO 1 STEP -1 160 GOSUB 50 170 NEXT DX 180 FOR DY=0 TO 199 190 GOSUB 50 200 NEXT DY 210 IF INKEY$=""THEN 210 220 END 230 TP=15 240 IF ABS(SX-DX)INT(X)) 410 YT=-(INT(VY)<>INT(Y)) 420 'TP=TP+.8 (original value) 430 TP=TP+.53+XT*.26+YT*.26 440 PSET (X,Y),TPMOD14+2 450 VX=X:VY=Y 460 RETURN

WTF where are my lines between 250-400 ??? I can see them on edit!

[Edit] Seems to be problem with greater than and smaller than characters... Fix the god damn forum!

Actually,if you add 2 spaces in line 240, it works !

10 BLOAD"xbasic.bin",R 20 SCREEN 5 30 _TURBO ON 40 GOTO 70 50 GOSUB 230 60 RETURN 70 S=S+1:T=RND(-S) 80 SX=128:SY=100 90 DY=200:FOR DX=0 TO 254 100 GOSUB 50 110 NEXT DX 120 FOR DY=200 TO 1 STEP -1 130 GOSUB 50 140 NEXT DY 150 FOR DX=255 TO 1 STEP -1 160 GOSUB 50 170 NEXT DX 180 FOR DY=0 TO 199 190 GOSUB 50 200 NEXT DY 210 IF INKEY$=""THEN 210 220 END 230 TP=15 240 IF ABS(SX-DX) < ABS(SY-DY) THEN 320 250 ' X is long 260 YL=(DY-SY)/ABS(SX-DX):Y=SY 270 FOR X=SX TO DX STEP SGN(DX-SX) 280 GOSUB 390 290 Y=Y+YL 300 NEXT X 310 RETURN 320 ' Y is long 330 XL=(DX-SX)/ABS(SY-DY):X=SX 340 FOR Y=SY TO DY STEP SGN(DY-SY) 350 GOSUB 390 360 X=X+XL 370 NEXT Y 380 RETURN 390 ' 400 XT=-(INT(VX)<>INT(X)) 410 YT=-(INT(VY)<>INT(Y)) 420 'TP=TP+.8 (original value) 430 TP=TP+.53+XT*.26+YT*.26 440 PSET (X,Y),TPMOD14+2 450 VX=X:VY=Y 460 RETURN

@mars2000you Thanx!

@wouter_ , the pic looks great, problem solved

Except for funny cases, the emu is like real machine.

Please tell cycles. What Mhz is base of the video emu?

*NYYRIKKI*wrote:

I just wrote similar test program in BASIC...

Thanks this is really useful. Which of the three scenarios (screen/sprite on/off) does this model try to approximate? I have the impression that the spacing between the colors matches the screen=off scenario well, but the slope of the sides of the octagonal matches the screen=on/sprite=off scenario better.

Since the shape seems to be symmetrical when mirrored horizontally/vertically it might be possible to get more accurate result when you draw lines from the corner of the screen instead of from the center.

BTW to get really accurate measurements I once did this:

* on a turbor machine, switch to Z80 mode (makes VDP IO faster)

* setup CMD (write all registers except for the CMD register itself)

* wait for start of VBLANK (maybe not necessary for screen=off, but gives more repeatable results in the other modes)

* reset MSX-TurboR E6-timer

* start VDP command

* insert a delay (varies between exactly 10-37 cycles)

* poll for the command to finish (IN A,(#99) RRA JP C,-6 <-- this loop takes 28 cycles in Z80 mode)

* read the value of the timer

Because the poll loop takes 28 cycles, it's possible that the command finishes right after the IN instruction, so you notice the end of the command 28 cycles too late. That's why there is an explicit delay right before this poll loop. We repeat this experiment 28 times, each time with a different delay (varies between 10-37 cycles, because i couldn't find a sequence of Z80 instructions that takes exactly 9 cycles). From all these 28 experiments we take the lowest result. The turboR timer at ports E6-E7 ticks once every 14 Z80 cycles, so that's also the accuracy of this measurement (note that there's also a fixed offset because the timer and the VDP command are not started at exactly the same moment). Alex Wulms once build a hardware circuit that contains a (32-bit) timer that ticks every Z80 cycle. The combination of that timer with the algorithm above should in principle allow to measure with 1 Z80 cycles accuracy (but for the VDP that's still 6 times too slow).

I didn't get very far yet with these measurements because they take a long time and it's easy to make mistakes. And then my interests shift elsewhere. As always, help would be appreciated.

*NYYRIKKI*wrote:

You should take the old pixel time you had and make new pixel time 2/3 of the original value. Then you should add X-step and Y-step pixel times that are 1/3 of the original pixel time. This gives very much same looking shape as the real computer.

Or simpler don't change the pixel time, and only add some time when there's a step in the minor-direction ;-) (The Bresenham line algorithm _always_ takes a step in the major direction).