Unrolled strlen

By bore

Expert (115)

bore's picture

07-04-2019, 02:14

OK, this is a bit of a nerdy* subject but I was giving the strlen() function from the standard C library some thought.

If one were to implement it on Z80 the natural solution would be to use the cpir instruction with some wrapper code:

; size_t strlen(const char *s);
strlen:
	xor a		;5
	ld c,a		;5
	ld b,a		;5
	cpir		;23/18
	ld hl,-1	;11
	sbc hl,bc	;17
	ret		;11

Execution time will be 72+n*23 cycles where n is the length of the string in characters.

So, the natural question is: Could you make it any faster?
Loop unrolling is usually the way to go, but with conditional code it seldom gives you an advantage since the branch often can be incorporated in the loop.

Unrolled code would look something like this:

	REPT	16
	cp (hl)		; 8
	jr z,.strend	; 8/13
	inc hl		; 7
	ENDR

You still get 23 cycles per character but larger code size and at some point you still have to loop.

However, if you look through the Z80 instruction set you will notice that the fastest conditional branch in the Z80 instruction set isn't jr at 8 cycles but the rarely used conditional return at 6 cycles.
What happens if we use it for an unrolled loop?

strlen:
	xor a		;5
	ld c,l		;5
	ld b,h		;5
	cp (hl)		;8
	call nz,loop	;18/11
	sbc hl,bc	;17
	ret		;11

N EQU 16

loop:
	REPT N
	inc hl		;7
	cp (hl)		;8
	ret z		;6/12
	ENDR
	jp loop	;11

OK, so figuring out how many cycles it will cost for each length is a bit trickier here.
The first thing to notice is the conditional call. It costs as much as a non-conditional call but gives an early exit for 0-length string.
At 62 cycles it is actually faster than the cpir version, but that is not really surprising since we never enter any loop.
What about a string that is only one character long?
Well, you get some call overhead and end up at 96 cycles instead of the 95 cycles of the cpir version.
After that you get an additional cost of 21 cycles per character. That is 2 cycles faster than cpir.
To make it faster you need to recuperate the cost of the jp instruction.
That means that as long as you repeat 6 or more times the loop expanded version will be faster than the cpir based one.

There is one thing that seems a bit iffy about the loop expansion.
At the end of it you have a conditional return followed by a non-conditional jump.
For strings lengths that ends at that particular character that is the fastest option. For longer strings you would want to adjust it somewhat.

strlen:
	xor a		;5
	ld c,l		;5
	ld b,h		;5
	cp (hl)		;8
	call nz,loop	;18/11
	sbc hl,bc	;17
	ret		;11

N EQU 16-1

loop:
	REPT N
	inc hl		;7
	cp (hl)		;8
	ret z		;6/12
	ENDR
	inc hl		;7
	cp (hl)		;8
	jp nz,loop	;11
	ret		;11

This version costs a bit more if the string ends at the same time as the loop unrolling but it costs 6 cycles less per loop compared to previous version.

I don't really think anyone needs to quickly do a strlen on many long strings on a Z80 and I can't really think of any other function where this is useful.
I just didn't want to be the only one to waste some time on this.

* Don't say I didn't warn you.

Login or register to post comments

By Rogerup

Resident (36)

Rogerup's picture

07-04-2019, 05:23

Very clever.

I have a routine that needs strlen for long strings, but in my case I just need to know if the string is bigger than n characters, so I solved the problem creating the strnlen function.

If I need a whole and fast solution, now I have one to use.

Congratulations.

By ericb59

Paladin (828)

ericb59's picture

07-04-2019, 07:32

Thank your for sharing this kind of useful function Bore. I think I will implement it inside FUSION-C ;-)

By Grauw

Enlighted (7961)

Grauw's picture

07-04-2019, 14:28

Unrolling consumes more memory though, and I think it’s unlikely this function is in a code path where those 2 cycles per iteration really matter. So @ericb59, what I mean to say is, I don’t think the standard library should have an unrolled implementation, it will reduce the memory available for program code with little benefit.

By ricbit

Champion (429)

ricbit's picture

07-04-2019, 23:41

I like this idea, going to use it in the future, thanks.

By NYYRIKKI

Enlighted (5250)

NYYRIKKI's picture

08-04-2019, 05:12

Yes, very nice read, but I just wanted to add that in general if you expect that you need string length, it is usually better to calculate it during string generation time and then just store it as header before actual string.

My MSX profile