suggestion on optimization

페이지 2/5
1 | | 3 | 4 | 5

By ARTRAG

Enlighted (6242)

ARTRAG의 아바타

02-04-2019, 08:05

I confirm that the second DJNZ is out of range... Thanks anyway: now the code is this

	struct sat
y		db	0
x		db	0
f		db	0
c		db	0
	ends


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
;	plot enemies and bullets if visible
;
;	depends on xmap,ymap

_plot_enemy:

	ld	iy,(alt_ram_sat)
	ld	ix,enemies
	ld	bc,max_enem*256+0
	
	ld	hl,(ymap)
	ld	de,-16
	add	hl,de
	ld	(tempy),hl

	ld	hl,(xmap)
	ld	de,-32
	add	hl,de
	ld	(tempx),hl

.npc_loop1:
	res 7,(ix+enemy_data.status)	; set it as invisible
	bit 0,(ix+enemy_data.status)
	jp	z,.next

	ld	l,(ix+enemy_data.y+0)
	ld	h,(ix+enemy_data.y+1)
	ld	de,(tempy)
	and a
	sbc hl,de		; hl = enemy.y + 16 - ymap <0
	jp	m,.next		; enemy.y - ymap < -16

	ld	de,128+16
	sbc hl,de		; enemy.y - ymap + 16 - 128 - 16 >= 0 
	jp	nc,.next	; enemy.y - ymap  >= 128
	ld	e,128+64
	add	hl,de
	ld	(iy+sat.y),l
	ld	(iy+sat.y+4),l	; not needed if single layer but in this way it is overall faster 
	
	ld	l,(ix+enemy_data.x+0)
	ld	h,(ix+enemy_data.x+1)
	ld	de,(tempx)
	and a			
	sbc hl,de		; hl = enemy.x + 32 - xmap < 0
	jp	m,.next		; hl <0  <==> dx = enemy.x - xmap < -32
	
	ld	de,32
	sbc hl,de		; enemy.x + 32 - xmap - 32 <0

	ld	a,(ix+enemy_data.color)
	jp nc,.noec		; -32< dx <0
	or	128			; set EC
	add	hl,de		; add 32
.noec
	ld	e,a
	ld	a,h
	and a
	jp	nz,.next	; dx >255
	
	ld	a,(ix+enemy_data.frame)
	ld	(iy+sat.x),l				; write X
	ld	(iy+sat.f),a				; write shape
	ld	(iy+sat.c),e				; write colour
	inc c
	set 7,(ix+enemy_data.status)	; set it as visible
	cp	16*4					; hard coded in the SPT
	jp	nc,.two_layers

.one_layer:

	ld	e,sat
	add iy,de
	; jp 	.next
		
.next:
	ld	de,enemy_data
	add ix,de
	djnz	.npc_loop1

	ld	a,c
	ld	(alt_visible_sprts),a
	ret
	
.two_layers:
	
	ld	(iy+sat.x+4),l				; second layer X
	add	a,4
	ld	(iy+sat.f+4),a				; second layer shape
	ld	a,e
	and 0xF0
	inc	a						; second layer is black
	ld	(iy+sat.c+4),a	
	inc c
	ld	e,2*sat
	add iy,de
	jp 	.next

By RetroTechie

Paragon (1563)

RetroTechie의 아바타

03-04-2019, 05:52

Do you have an idea about how often (on average) each conditional jump inside the loop gets taken? If so:
-If condition is met / jump made in most cases, absolute jump (as used) is faster than relative jump. BUT
-If condition is NOT met / jump 'falls through' in most cases, relative jump is 3 T-cycles faster (and 1 byte shorter).

Also if there's a common case that happens in the vast majority of cases, you could arrange code such that this common case is a sequence of non-taken relative jumps (with slower JR's done in less common cases). Obviously replacing conditional JP's with JR's can only be done for Carry or Zero tests.

Then this snippet:

	and a			
	sbc hl,de		; hl = enemy.x + 32 - xmap < 0

Is the AND A to reset the carry flag? Right before it there's an ADD HL,DE and only LD's ... in between. Depending on what number ranges it adds, perhaps that ADD HL,DE can be trusted to never produce a carry overflow? If so, CF would be guaranteed zero there such that the AND A can be removed.

By ARTRAG

Enlighted (6242)

ARTRAG의 아바타

03-04-2019, 07:42

Thanks but, the jump branch is more common than the case falls through. Moreover I am using the minus condition, not supported by Jr.
All and a where cf was already reset have been removed...

By bore

Expert (115)

bore의 아바타

03-04-2019, 09:57

If the first jump is the most common you can place the .next directly after it to save a few cycles.

.npc_loop1:
	res 7,(ix+enemy_data.status)	; set it as invisible
	bit 0,(ix+enemy_data.status)
	jr	nz,.not_next
.next:
	ld	de,enemy_data
	add ix,de
	djnz	.npc_loop1
.not_next:

But that seems like it would make the worst-case worse while optimizing for when you have fewer active sprites.
Keeping the code as it is but changing the first jp to relative would optimize the worst case.

Regarding the range checks one thing you can do is:
If you want to check if x1 <= hl < x2
1) Add -x2 to hl (The first addition makes hl become negative if it is less than x2)
2) Add x2-x1 to hl (The second addition moves it back to positive if it is more or equal to x1)
This will set the carry if hl is in range.
The result will be hl-x1

I'm not sure if it will save you anything since you will need to calculate two temp variables, but it could be worth a try.

By bore

Expert (115)

bore의 아바타

03-04-2019, 10:16

Wait, x2-x1 is constant in your case, so you can probably save a whole bunch of cycles that way.
Both by replacing sbc with add and by merging two jumps into one.

By bore

Expert (115)

bore의 아바타

03-04-2019, 12:27

Clarification:

	ld	hl,-128
	ld	de,(ymap)
	and	a
	sbc	hl,de
	ld	(tempy),hl	; tempy = -(ymap + 128)

and

	ld	l,(ix+enemy_data.y+0)
	ld	h,(ix+enemy_data.y+1)
	ld	de,(tempy)

	add	hl,de		; hl = enemy.y - (ymap + 128)
	ld	de,128+16
	add	hl,de		; hl = enemy.y - ymap + 16
	jr	nc,.next	; !(-16 <= enemy.y - ymap < 128)

	ld	a,l
	add	a,64-16		; a = enemy.y - ymap + 64
	ld	(iy+sat.y),a
	ld	(iy+sat.y+4),a	; not needed if single layer but in this way it is overall faster 

By ARTRAG

Enlighted (6242)

ARTRAG의 아바타

04-04-2019, 21:12

Bore, your solution is pure genius and works like a charm!
Thanks !!! This is the current version

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
;	plot enemies and bullets if visible in the current SAT in ram
;
;	depends on xmap,ymap

_plot_enemy:

	ld	iy,(alt_ram_sat)
	ld	ix,enemies 
	ld	bc,(max_enem + max_plyr_bullets + max_enem_bullets)*256+0
	
	ld	hl,-128
	ld	de,(ymap)
	and a
	sbc	hl,de
	ld	(tempy),hl

	ld	hl,(xmap)
	ld	de,-32
	add	hl,de
	ld	(tempx),hl

.npc_loop1:
	res 7,(ix+enemy_data.status)	; set it as invisible
	bit 0,(ix+enemy_data.status)
	jp	z,.next

	ld	l,(ix+enemy_data.y+0)
	ld	h,(ix+enemy_data.y+1)
	ld	de,(tempy)
	
	add	hl,de		; hl = enemy.y - (ymap + 128)
	ld	de,128+16
	add	hl,de		; hl = enemy.y - ymap + 16
	jr	nc,.next	; !(-16 <= enemy.y - ymap < 128)

	ld	a,l
	add	a,64-16		; a = enemy.y - ymap + 64	
	ld	(iy+sat.y+0),a
	ld	(iy+sat.y+4),a	; not needed if single layer but in this way it is overall faster 
	
	ld	l,(ix+enemy_data.x+0)
	ld	h,(ix+enemy_data.x+1)
	ld	de,(tempx)
	and a			
	sbc hl,de		; hl = enemy.x + 32 - xmap < 0
	jp	m,.next		; hl <0  <==> dx = enemy.x - xmap < -32
	
	ld	de,32
	sbc hl,de		; enemy.x + 32 - xmap - 32 <0

	ld	a,(ix+enemy_data.color)
	jp nc,.noec		; -32< dx <0
	or	128			; set EC
	add	hl,de		; add 32
.noec
	ld	e,a
	ld	a,h
	and a
	jp	nz,.next	; dx >255
	
	ld	a,(ix+enemy_data.frame)
	ld	(iy+sat.x),l				; write X
	ld	(iy+sat.f),a				; write shape
	ld	(iy+sat.c),e				; write colour
	ld	(ix+enemy_data.plane),c		; save SAT plane
	inc c
	set 7,(ix+enemy_data.status)	; set it as visible
	cp	16*4					; hard coded in the SPT
	jp	nc,.two_layers

.one_layer:

	ld	e,sat
	add iy,de
	; jp 	.next
		
.next:
	ld	de,enemy_data
	add ix,de
	djnz	.npc_loop1

	ld	a,c
	ld	(alt_visible_sprts),a
	ret
	
.two_layers:
	
	ld	(iy+sat.x+4),l				; second layer X
	add	a,4
	ld	(iy+sat.f+4),a				; second layer shape
	ld	a,e
	and 0xF0
	inc	a						; second layer is black
	ld	(iy+sat.c+4),a	
	inc c
	ld	e,2*sat
	add iy,de
	jp 	.next

By ARTRAG

Enlighted (6242)

ARTRAG의 아바타

04-04-2019, 21:32

Actually still I do not get why the jr nc test includes also this condition:
enemy.y - ymap < 128

By bore

Expert (115)

bore의 아바타

05-04-2019, 01:03

If enemy.y - ymap was more than 128 the first addition doesn't bring hl down below 0 so the second addition doesn't set the carry.

Essentially the first addition brings the valid range down so that the top of the valid range is -1.
Then the second addition adds the valid range width to make sure than every value of hl that was in the valid range cross the 0-boundary and sets the carry.
The same method can also be used for efficient whitespace checking since BS,TAB,LF,VT,FF,CR all are in a range.

By bore

Expert (115)

bore의 아바타

05-04-2019, 01:18

Am I correct in assuming that bit 0 in the status byte is used for enemy/bullet allocation?

What is the visibility bit used for? Is it really necessary to clear it for objects that aren't allocated?
If you reset bit 7 at the same time as you reset bit 0 you won't need to reset it every time in the loop.
Then you can move it to be done after the first jp so that you don't have those cycles all the time for objects that aren't in use.

Also, regarding the allocation bit:
Wouldn't it be more efficient to keep track of the number of active objects instead?
Removal of an object would then cost a bit more since you need to copy the last object into the removed object slot, but allocation would be faster since you don't need to scan the array.
You would also not need to check bit 0 in this loop since you then know the number of objects in use.

If it is necessary that enemies and bullets are in separate arrays you can still achieve that by calling this function multiple times with ix and b as parameters.
The multiple calls will likely cost less than checking for bit 0 in every object.

페이지 2/5
1 | | 3 | 4 | 5