adding a NOP between out (099h),a in a,(098h) seems to solve...
So is it a "fast" access problem ? Is it on TR only or on any vdp?I don't know much of access timing on V99x8, but from reading the code, it would definitely cause problems on an MSX1. The out to $99 sets the address in read-mode, which means the VDP internally reads from VRAM, comparable to this:
[...]
or 64
out ($99),a ; set address in write-mode
in a,($98)
in a,($98) ; uh oh i'm too fast
adding a NOP between out (099h),a in a,(098h) seems to solve...
a simple LD A,$3E will also solve the problem and you will get a wonderful T-State for free 
Tested even in "strange" MSX1 models.
($3E because "LD A,n" is the same hex value and it could help with compression ...)
$3E because "LD A,n" is the same hex value and it could help with compression ...
You freak...
I've once done a reasonably responsive full-screen audio scope for MSX1 (separate, faster MSX2 version) showing input of a self-built sound sampler (real-time). So MSX1 VRAM access is fast enough. Too much work to dig up source for that (if I still have it), but some tips:
- Keep copy of the pixel data in RAM (~6 KB). At beginning, initialize VRAM with that. If you modify a pixel, update RAM data & write only modified byte to VRAM. That way you avoid the 'read' part in 'read VRAM - modify - write VRAM' type operations. Erasing pixels goes similar, required VRAM access time then becomes "set VRAM address + write 1 byte" / pixel change (smaller screen area for faster effect). I'm not sure how I did this for MSX2 - maybe also screen2 but with tighter timings.
- No waiting! In required delays for VRAM access, do something else. Like, piece of calculation for next pixel. Or some logical / bit set operation on the byte that's to be written.
I don't know if this works or not... I also don't know if it is fast or not... feel free to analyze, it is just an idea... 
ORG #C000
DW 1,2,4,8,16,32,64,128
;PSET routine for SC2:
; Input:
;
; DE = Pointer to 1 byte of free RAM
; H = Y
; L = X
;
; Output: point in screen that is initialized in "even row, odd row"-style
; alias "big sprite compatibility mode" :
;
; 0 2 4 6 ... 62
; 1 3 5 7 ... 63
; 64 66 68 70 ... 126
; 65 67 69 71 ... 127
PSET:
XOR A
LD BC,#F99
EX DE,HL
LD (HL),D
RRD
LD D,(HL)
EX DE,HL
ADD HL,HL
LD D,A
LD A,B
AND L
LD E,A
XOR L
XOR D
OUT (C),A
LD L,A
OUT (C),H
LD D,#C0
EX DE,HL
IN A,(#98)
OR (HL)
OUT (C),E
SET 6,D
OUT (C),D
OUT (#98),A
RET
Hmm... I think I found better way to draw pixel on screen just by swapping two bits!
At least this takes less memory 
(Still purely theoretical untested code)
ORG #C000
DB 1,2,4,8,16,32,64,128
;PSET routine for SC2:
; Input:
;
; DE = Pointer to 1 byte of free RAM
; H = Y
; L = X
;
; Output: pixel in screen
; NOTE: Before using this routine name table should be formatted like this:
;
; A15 A8 A7 A0
; 00 RW 00 Y7 Y6 Y5 Y4 X3 | X7 X6 X5 X4 Y3 Y2 Y1 Y0
;
PSET:
EX DE,HL
LD (HL),D
LD A,E
AND 7
LD B,A
LD C,#99
XOR E
RRD
OUT (C),A
LD E,A
LD A,(HL)
RLCA
OUT (C),A
LD D,A
LD H,#C0
IN A,(#98)
LD L,B
OUT (C),E
SET 6,D
OUT (C),D
OR (HL)
OUT (#98),A
RET
hummm
what do you mean when you say
; NOTE: Before using this routine name table should be formatted like this: ; ; A15 A8 A7 A0 ; 00 RW 00 Y7 Y6 Y5 Y4 X3 | X7 X6 X5 X4 Y3 Y2 Y1 Y0 ;
is it the same as above when you say
; Output: point in screen that is initialized in "even row, odd row"-style ; alias "big sprite compatibility mode" :
No, sorry I was not very clear...
I try to explain this a bit better:
With A0-A15 I mean the data that is sent to VDP (address setup)
With D0-D7 I mean the actual name table part, that can be altered.
with X0-X7 I mean pixel X-coordinate (Note X0-X2 are inside a byte)
With Y0-Y7 I mean pixel Y-coordinate
By default (in BASIC) the name table looks like this: ; A15 A8 A7 A0 ; 00 RW 00 Y7 Y6 Y5 Y4 Y3 | X7 X6 X5 X4 X3 Y2 Y1 Y0 ; D7 D0 ... In first example the name table required looks like this: ; A15 A8 A7 A0 ; 00 RW 00 Y7 Y6 Y5 Y4 X7 | X6 X5 X4 X3 Y3 Y2 Y1 Y0 ; D7 D0 ... for later example it should be like this: ; A15 A8 A7 A0 ; 00 RW 00 Y7 Y6 Y5 Y4 X3 | X7 X6 X5 X4 Y3 Y2 Y1 Y0 ; D7 D0
... I'm still not sure if this is clear, so here is routine that should initialize name table correctly for the later routine:
NameTableInit: LD HL,#5800 LD C,#99 OUT (C),L OUT (C),H DEC C .LOOP LD A,#5B CP H RET Z LD A,L CALL .COMP LD A,L JR NZ,.SKIP XOR 33 .SKIP OUT (C),A INC HL JP .LOOP .COMP AND 33 DEC A RET Z CP 31 RET
Actually you get 203 T states (by Jannone's BIT Assembler).
I think it is a world record!!!! (divik's one is 227 T states)
I'll try you code, but also the idea by RetroTechie of reserving 6K of RAM for avoiding the reading from VRAM is very interesting.
Could your solution improve even more by exploiting a 6K ram buffer that mirrors the VRAM?
