About C / Z80 optimizations (SDCC)

Page 11/15
4 | 5 | 6 | 7 | 8 | 9 | 10 | | 12 | 13 | 14 | 15

By PingPong

Prophet (3413)

PingPong's picture

12-09-2019, 10:46

[quote=zPasi wrote:
ARTRAG wrote:

The fact is that Hi-Tech C has evolved from 1984 (first CPM version) to 2001 (last MS-DOS) version, to be abandoned about in the same year. It is the effort of a single private company, but it is the result of 18 years of development.
I cannot say if it is still generating the best code under all circumstances, but for sure no other current project has such a long history of improvements and evolution.

Really? SDCC has been around since 1999, perhaps longer. 20+ years Smile

And i must see the progress in quality of code generated is somewhat unimpressive. Even with 20+ years ;-)

By Timmy

Expert (102)

Timmy's picture

12-09-2019, 13:12

So even at page 11, SDCC and z88dk is still superior! That's what you call progress.

*Grabs popcorn*

In the mean time, I'm working on something, too. Hopefully a new game. Smile

By PingPong

Prophet (3413)

PingPong's picture

12-09-2019, 13:28

It is proved that the "superior" performances of SDCC are due to the math library, not the compiler.
But for msx development having a 32 bit fast library is less important that having a compiler that with its inefficiency waste the efforts one could gain in a highly optimized assembly code.
What is the benefit i heavily optimize a asm routine to gain , say 300 cycles if the compiler, when calling this routine, waste 100 or more T-cycles just to pass some parameters? Or force me to adhere to "ALL PARAMETERS ON STACK" rule even if not necessary?
Then i can throw this compiler in the bin and write all in assembly.....

The reason of the z88Dk being "superior" is not the compiler rather that everithing executed is done at 90% in asm libraries that are highly optimized.
But when we write code library functions is not all we use. Instead, code logic, resource access, etc rely on specific platform and developer.

There is some comparison somewhere that is between the old C z88dk compiler and SDCC. the old compiler, (previously referred as fast) is somewhat so crappy that SDCC outperform the former in every test. The article then recommend to used SDCC instead of the old compiler.

But before that, when SDCC was not an option on z88dk the compiler had a reputation of FAST.

the FAST come from libraries, the compiler only was acting as a glue between a bunch of library functions.

By Timmy

Expert (102)

Timmy's picture

12-09-2019, 13:58

I'm really glad you're looking into the technical capabilities of the different compilers. Smile However, that is not what I would have defined as 'superior'. But I don't mind you continue following that path.

PingPong wrote:

What is the benefit i heavily optimize a asm routine to gain , say 300 cycles if the compiler, when calling this routine, waste 100 or more T-cycles just to pass some parameters? Or force me to adhere to "ALL PARAMETERS ON STACK" rule even if not necessary?
Then i can throw this compiler in the bin and write all in assembly.....

As I said before in another thread, some people are better off writing all in assembly. In that particular thread, I've also indicated that your expectations require a fully hand optimised assembly code. That's absolutely fine. Your expectations are not wrong, in fact, you are absolutely right!

By PingPong

Prophet (3413)

PingPong's picture

12-09-2019, 16:51

Timmy wrote:

I'm really glad you're looking into the technical capabilities of the different compilers. Smile However, that is not what I would have defined as 'superior'. But I don't mind you continue following that path.

PingPong wrote:

What is the benefit i heavily optimize a asm routine to gain , say 300 cycles if the compiler, when calling this routine, waste 100 or more T-cycles just to pass some parameters? Or force me to adhere to "ALL PARAMETERS ON STACK" rule even if not necessary?
Then i can throw this compiler in the bin and write all in assembly.....

As I said before in another thread, some people are better off writing all in assembly. In that particular thread, I've also indicated that your expectations require a fully hand optimised assembly code. That's absolutely fine. Your expectations are not wrong, in fact, you are absolutely right!

No, it's only that we are working on a z80 bit cpu. and a relatively slow clock.
We have not the luxury of plenty of GigaHerz available. So every single drop of cpu power should be used.
The reason why C is useful is that some things are hard in asm.
So we expect to pay some performance penalty in exchange of more ease of code. But this should be a reasonable penalty.
and a bad C compiler is what make the word reasonable sound like inaceptable.

By zPasi

Champion (410)

zPasi's picture

13-09-2019, 09:10

PingPong wrote:

You cannot say "i see nothing". If you compile both artrag sources and measure the time took to draw a circle

I can say I saw nothing because that's true Smile

I compared the asm outputs and saw nothing signifcant. Couldn't run the binaries because for that I needed the right libraries for both compilers.

Quote:

i will prepare some benchmarks to see what is the overall result.

Please do. That will be interesting.

By PingPong

Prophet (3413)

PingPong's picture

13-09-2019, 09:15

[quote=zPasi wrote:
PingPong wrote:

You cannot say "i see nothing". If you compile both artrag sources and measure the time took to draw a circle

I can say I saw nothing because that's true Smile

I compared the asm outputs and saw nothing signifcant. Couldn't run the binaries because for that I needed the right libraries for both compilers.

It is not true, a clear winner should exists. It is unlikely that they perform equally

By zPasi

Champion (410)

zPasi's picture

13-09-2019, 15:06

Meanwhile, I've found the holy grail of SDCC optimizations!

SDCC uses a system called peephole optimizer. I've learned that in addition to built in peephole rules, you can write your own! And it is easy, you just write the rules in a text file! No need to rebuild the SDCC.

For example, for some reason SDCC does this:

. ld c,l
  ld b,h
  ld (_some_var), bc

You'll write a rule like this:

replace restart{
	ld	c,l
	ld	b,h
	ld	(%1), bc
} by {
	ld	(%1), hl
}

Then just insert --peep-file [my peep file name] to SDCC compiler options.

And voila! You'll get

  ld (_some_var), hl

Now, it is possible that some later code expects that value to be in bc. No problem, just insert if notUsed('bc') to the rule. So it becomes:

replace restart{
	ld	c,l
	ld	b,h
	ld	(%1), bc
} by {
	ld	(%1), hl
} if notUsed('bc')

So far I have already optimized most of annoying "stupidity" away. Cool

Of course everything can't be done this way, but a lot anyway.

By PingPong

Prophet (3413)

PingPong's picture

13-09-2019, 16:09

zPasi wrote:

Meanwhile, I've found the holy grail of SDCC optimizations!

SDCC uses a system called peephole optimizer. I've learned that in addition to built in peephole rules, you can write your own! And it is easy, you just write the rules in a text file! No need to rebuild the SDCC.

For example, for some reason SDCC does this:

. ld c,l
  ld b,h
  ld (_some_var), bc

You'll write a rule like this:

replace restart{
	ld	c,l
	ld	b,h
	ld	(%1), bc
} by {
	ld	(%1), hl
}

Then just insert --peep-file [my peep file name] to SDCC compiler options.

And voila! You'll get

  ld (_some_var), hl

Now, it is possible that some later code expects that value to be in bc. No problem, just insert if notUsed('bc') to the rule. So it becomes:

replace restart{
	ld	c,l
	ld	b,h
	ld	(%1), bc
} by {
	ld	(%1), hl
} if notUsed('bc')

So far I have already optimized most of annoying "stupidity" away. Cool

Of course everything can't be done this way, but a lot anyway.

I already know the peephole optimization. It servers manily to remove some redundant instructions that a compiler could generate when emitting code of two istructions it server to eliminate things like:

ld a, h
ld h,a
ld a,h
call somefunc

in

ld a, h
call somefunc

but is one form of optimization. unfortunately it does help relatively in better code. It is mainly used to remove the noise compilers generate between instructions . there are a lot of others that can affect a lot the code quality.

in the z80 scenario, a very good register allocator toghether with some form of O.R. is the heavy factor in generate better code. that is the compiler should discriminate if it is more convenient to generate one or another machine instruction sequence weighting also the requirement to use specific registers.

this is not an easy task because the unorthogonal nature of z80 instruction set.
for example: to access variables in a contiguos block of memory it is better to use index registers or pointing via DE register and performing some math to adjust DE value ?

It does depend on the context. there is no single answer to this question

By zPasi

Champion (410)

zPasi's picture

13-09-2019, 22:48

PingPong wrote:

but is one form of optimization. unfortunately it does help relatively in better code.

Whatever, but no more need to talk about things like this:

	ld	a, x
	push	af
	inc	sp
	ld	a, y
	push	af
	inc	sp

The rules have to be written once, then these are gone. Problem solved.

You can even rule this:

	ld	d, (hl)
	ld	e,#0x98
	push	de
	call	_OutPort
	pop	af

to this:

	ld	a,(hl)
	out	(#0x98),a
Quote:

in the z80 scenario, a very good register allocator toghether with some form of O.R. is the heavy factor in generate better code.

There are not so many general purpose registers in Z80, so how much can that do?

Page 11/15
4 | 5 | 6 | 7 | 8 | 9 | 10 | | 12 | 13 | 14 | 15