What do you mean R800 is 16-bit processor?

Page 3/5
1 | 2 | | 4 | 5

By sjoerd

Hero (593)

sjoerd's picture

22-04-2003, 11:56

Ok, in that case I'll explainThanx Smile
The problem with all registers being usable as stackpointer is that you have a problem when there is an interrupt. This is why even RISC processors that CAN use all registers as stackpointer have an internal or 'system' stackpointer for interrupts
Sparc has a TPC register to store the current PC when a trap occurs. Everything else is left as a exercise for the programmer Smile And most RISC processors don't use stacks for subroutines, but just link-registers. The cpu does not know that there is a stack.

>>- One registers is hardwired to zero. [BS, I know, but almost all risc-designs I know of have it]<But it works to reduce the instruction set. NOP and MOVE can now be done with OR or ADD.

If there's a theory describing a pure RISC I still have to see it. However, RISC being a big pool of thoughts about processor design, there are plenty of ideas that aren't compatible with eachother, like the stack pointer thingy...Having no stackpointers looks riscy to me Smile And starting a instruction every cycle does too.

>>>>You must know a trick I do not, because loops have not changed in speed on Z380.< You mean:

loop: ;...
DEC BC
LD A,C
OR B
JR NZ,loop

versus:

loop: ;...
DEC BC
LD HL,BC
ORW BC
JR NZ,loop

?
All I see in the latter is HL being destroyed, a register you are probably using in the loop body, and a increase in code-size. Maybe you can teach me? SmileWell, I thought about:
loop:
decw bc
jr nz,loop
but that won't work, I suppose. (Guessed decw bc would update the flags, the z380 being 16 bit LOL! )
Then I would go for:
loop:
addw -1
jr nz,loop
or even:
loop:
ex bc,hl
addw -1
ex bc,hl
jr nz,loop
or:
loop:
ex hl,hl'
- blablabla loopbody blablabla -
ex hl,hl'
addw -1
jr nz,loop
but that doesn't look that fast...
In your first loop A is destroyed. And probably HL is being used in the body, but being the 16 bit accumulator, HL is destroyed anyway, I guess. (It is in most of my loops). The increase in code-size doesn't bother me much (16MB, remember?).
Hmm. Where is that 16 bit djnz?

By anonymous

incognito ergo sum (109)

anonymous's picture

22-04-2003, 16:55

And most RISC processors don't use stacks for subroutines, but just link-registers. The cpu does not know that there is a stack.
- An array of link-registers is still a stack.
- An ARM uses a link-register too, but it still uses a stack for interrupts.

It HAS to work like that, or the user program can be corrupted by an interrupt.
Also there is a big disadvantage to using an array of link-registers in stead of a memory-based stack. What if it runs out? Then you have a pretty big problem.
Anyway, for 'embedded' or specialized operation it's not a big issue, but OTOH in a general computer system with (nested) interrupts coming from everywhere... :/

But it works to reduce the instruction set. NOP and MOVE can now be done with OR or ADD.
NOP? Why would anyone want to use NOP except for padding, it doesn't do ANYTHING.
You need a MOVE REG,VALUE anyways, so the only thing having a 0-register does is shortening the codesize. OTOH XOR REG,REG is just as short probably...
Maybe I need practical examples ^^; Just from experience, I don't use 0 very often in a program.

(Guessed decw bc would update the flags, the z380 being 16 bit LOL! )nope, the INC/DEC is still the same as on Z80, except it has been expanded to 32 bit.

Then I would go for:
loop:
addw -1
jr nz,loop

Wouldn't work because you most definitely need HL preserved, as it's the 16-bit accumulator.

or even:
*snip*
or:
*snip snip*
but that doesn't look that fast...
It doesn't indeed.. EX instructions are really slow on Z380, they take 3 cycles (vs. 2 cycles for most others).

Hmm. Where is that 16 bit djnz?^_^ Don't forget the 24-bit one Smile

By sjoerd

Hero (593)

sjoerd's picture

23-04-2003, 02:52

An array of link-registers is still a stack.Sure, but I meant one link register.

An ARM uses a link-register too, but it still uses a stack for interrupts.No, it does not. The programmer has to push the link register himself.

It HAS to work like that, or the user program can be corrupted by an interrupt.
Also there is a big disadvantage to using an array of link-registers in stead of a memory-based stack. What if it runs out? Then you have a pretty big problem.
What if you get a page fault when you are trying to push your link adress. Then you are in some serious problems...

Anyway, for 'embedded' or specialized operation it's not a big issue, but OTOH in a general computer system with (nested) interrupts coming from everywhere... :/It is a big issue for embedded or specialized operation. In such systems there are some real time constraints to be met. No one cares if Windows crashes one or two times a hour. Smile

>>But it works to reduce the instruction set. NOP and MOVE can now be done with OR or ADD.<<
NOP? Why would anyone want to use NOP except for padding, it doesn't do ANYTHING.
To fill branch delay slots, for instance. It's no use to use it for padding since all instructions are the same size anyway... And you can pad your data with any value you want. (But using zero makes sense, of course. Still, nop isn't alway zero Smile ).

You need a MOVE REG,VALUE anyways, so the only thing having a 0-register does is shortening the codesize. OTOH XOR REG,REG is just as short probably...
You do not need a MOVE REG,VALUE when you can do ADD r1,r0,5 to move 5 into r1. A MOVE register,register is very different from a MOVE register, value. And ofcourse XOR R3,R3,R3 is as short as ADD R0,R0,R3. The 0-register is also used when you do not care about the outcome...

Maybe I need practical examples ^^; Just from experience, I don't use 0 very often in a program.That's because you program z80,z380,r800 and arm. Having a register hardwired to zero isn't just to load that value into a register...

>>Hmm. Where is that 16 bit djnz?<<^_^ Don't forget the 24-bit one SmileOr the 32 bit version...

But hey I might be oO now, of course. (So I may give some practical examples in the near future Big smile )

By anonymous

incognito ergo sum (109)

anonymous's picture

23-04-2003, 11:23

>>An ARM uses a link-register too, but it still uses a stack for interrupts.<Ok, but that's using the STACK POINTER. Even if it's a manual stack, it's still a stack, with a specialized stack pointer. You can pretend it's just a normal general purpose register, but it's not.

What if you get a page fault when you are trying to push your link adress. Then you are in some serious problems...What if your cat pisses on your mainboard. Then you are in some serious problems... oO

To fill branch delay slots, for instance. It's no use to use it for padding since all instructions are the same size anyway... And you can pad your data with any value you want. (But using zero makes sense, of course. Still, nop isn't alway zero Smile ).
Delay slots suck. It's a cheap hack to save circuitry in branch handling.
About nop, it's indeed not always zero. IIRC, it's EAh on 6502 and x86 doesn't even have a real NOP.

A MOVE register,register is very different from a MOVE register, value. And ofcourse XOR R3,R3,R3 is as short as ADD R0,R0,R3. The 0-register is also used when you do not care about the outcome...Why would you do an operation for which you don't care about the outcome?! :/ And if it was a general purpose register, you could just ignore it. I don't see the practicality in this...

>>>>Hmm. Where is that 16 bit djnz?<<^_^ Don't forget the 24-bit one Smile<Actually, there is no 32 bit version! LOL

But hey I might be oO now, of course. (So I may give some practical examples in the near future Big smile )Bah, never mind now... This thread has become way off-topic anyway and doesn't seem to be going anywhere anymore ^^;

By sjoerd

Hero (593)

sjoerd's picture

25-04-2003, 01:13

Ok, but that's using the STACK POINTER. bla ba bla...No that's storing the link register in memory. You can use any general purpose register for that as A stack pointer. Risc == No specialized stack pointer. Specialized stack pointer == risc based. Smile I guess you still think of the ARM7TMDI as the ultimate risc cpu.
I tried very hard to see your point (right Wink ), so think about this one. Risc == No specialized stack pointer. The reasons ARM advises R13 to be used as a stack pointer are the dsp origin (context switching) and the thumb mode, still there is no hardware stack support... (AFAIK, I have to be careful here Tongue ).

What if your cat pisses on your mainboard. Then you are in some serious problems... oO Hmm, I like my own jokes better... Wink

Delay slots suck. It's a cheap hack to save circuitry in branch handling.
About nop, it's indeed not always zero. IIRC, it's EAh on 6502 and x86 doesn't even have a real NOP.
Delay slots are a cheap hack to execute a usefull instruction instead of stalling... The fist mips did not stall anyway. It is not a hack; all the first risc cpu's have this feature...
And in modern cpu's it is very stupid to not have a real nop.

Why would you do an operation for which you don't care about the outcome?! :/ And if it was a general purpose register, you could just ignore it. I don't see the practicality in this...Intel did it with the Itanium, so there might be a chance that there is no practicality...

>>>>>>Hmm. Where is that 16 bit djnz?<<^_^ Don't forget the 24-bit one Smile<I was looking for that 16-bit counter version, not the 16 or 24 bit jumps...

Bah, never mind now...I won't. I think it is stupid to waste a resource like registers, so I can not defend it very wel.

I think the newMSX should get a vliw-cpu. Risc is something of the past. Epic has the future. Risc is used to describe everything else... (whahaha).

However, I still got the feeling that having a pipeline is enough for you to call a cpu risc-based... Big smile

This thread has become way off-topic anyway and doesn't seem to be going anywhere anymore ^^;So it did seem to be going anywhere? LOL!

Back on topic: R800 == 8 bit. Evil With some 16 bit instructions for adresscalculations. 8) And these 16 bit instructions use a 16 bit ALU. Everybody happy. LOL!

By anonymous

incognito ergo sum (109)

anonymous's picture

25-04-2003, 15:10

No that's storing the link register in memory. You can use any general purpose register for that as A stack pointer.
No you can't, coz you'll foul up whatever register you are using. The link register and stack pointer are swapped with alternate registers when an interrupt occurs. So you are pretty much forced to use the stack pointer as an actual stack pointer, making it a manual stack, not just 'storing in memory'.

I guess you still think of the ARM7TMDI as the ultimate risc cpu.Indeed I like the ARM architecture (not just ARM7TDMI :evilSmile very much, but the fact that it's one of the most popular processors in the world and highly regarded as an efficient RISC CPU is the actual reason I keep bringing it up.

Delay slots are a cheap hack to execute a usefull instruction instead of stalling... The fist mips did not stall anyway. It is not a hack; all the first risc cpu's have this feature...Is it or isn't it a hack? You are contradicting yourself here.

Intel did it with the Itanium, so there might be a chance that there is no practicality...LOL, good one Smile

I was looking for that 16-bit counter version, not the 16 or 24 bit jumps...Doesn't exist, would be cool tho...

I won't. I think it is stupid to waste a resource like registers, so I can not defend it very wel.Bah, then we are in agreement?! WTF were you babbling about before then?

I think the newMSX should get a vliw-cpu. Risc is something of the past. Epic has the future. Risc is used to describe everything else... (whahaha).I think you listened too much to Intel marketing. VLIW might be potentially powerful (if executed right), but it's no fun for assembly programming. ARM is very fun for assembly programmers, so it's a great choice for the new MSX. Any processor that's not easily programmable in ASM falls outside of the MSX philosophy, as far as I'm concerned.

However, I still got the feeling that having a pipeline is enough for you to call a cpu risc-based... Big smileHow did that thought ever get into your mind? That's almost an insult to my intelligence ^^;

Back on topic: R800 == 8 bit. Evil With some 16 bit instructions for adresscalculations. 8) And these 16 bit instructions use a 16 bit ALU. Everybody happy. LOL! And you say you're not stubborn? Besides, I don't know what kind of programs you write, but I tend to use the 16 bit instructions (don't forget INC/DEC belong to those too) for alot more than that!

By sjoerd

Hero (593)

sjoerd's picture

25-04-2003, 15:46

No you can't, coz you'll foul up whatever register you are using. The link register and stack pointer are swapped with alternate registers when an interrupt occurs. So you are pretty much forced to use the stack pointer as an actual stack pointer, making it a manual stack, not just 'storing in memory'.Your right as long as it concerns ARM. As I said more pure Risc processers do not have specialized stack registers.

Indeed I like the ARM architecture (not just ARM7TDMI :evilSmile very much, but the fact that it's one of the most popular processors in the world and highly regarded as an efficient RISC CPU is the actual reason I keep bringing it up.OK, but you are trying to proof your point concerning stack pointers using ARM. There are lots of risc cpu's that don't force you to use a 'special' register as a stackpointer.

>>Delay slots are a cheap hack to execute a usefull instruction instead of stalling... The fist mips did not stall anyway. It is not a hack; all the first risc cpu's have this feature...<No hack.

Bah, then we are in agreement?! WTF were you babbling about before then?It was a great idea to reduce the instruction set, but modern processors need better hints. It is not very smart anymore to execute a sub r2,r2,r2 to load 0 into r2, however in past cpu's it didn't matter. Sorry for the confusion oO

I think you listened too much to Intel marketing. VLIW might be potentially powerful (if executed right), but it's no fun for assembly programming.Don't worry. And Risc in general isn't fun for assembly programmers anyway...

ARM is very fun for assembly programmers, so it's a great choice for the new MSX. Any processor that's not easily programmable in ASM falls outside of the MSX philosophy, as far as I'm concerned.Very true. I am with you here Smile

>>However, I still got the feeling that having a pipeline is enough for you to call a cpu risc-based... Big smile<Just a feeling... Smile

And you say you're not stubborn? Besides, I don't know what kind of programs you write, but I tend to use the 16 bit instructions (don't forget INC/DEC belong to those too) for alot more than that!OK, I AM stubborn when I think I am right Tongue
And I write that kind of program that's never finished... And do you mean I can use INC HL instead of:
push af
ld a,l
add a,1
ld l,a
jr nz,_hup
ld a,h
add a,1
ld h,a
_hup
pop af
8) Big smile

By Grauw

Enlighted (7968)

Grauw's picture

28-04-2003, 04:52

Heh, wanted to reply here too, I guess ^_^.
(just making Snout happy)

The use of 16 bit loops is faster on z380,

Nonsense. I can program a 16-bit loop on Z80 which is just as fast as an 8-bit loop. Same technique can be used for 24- and 32-bit loops. For more info check http://map.tni.nl/?p=articles/fast_loops.html

loading and storing 16 bit values in memory is faster, logical operations are faster on z380. Some problems are 16 bit in essence, creating a 8 bit solution does not make it any faster. The use of more registers makes z380 optimized code faster because you'll have to use the memory a lot less. (Okay, I know: There are no algorithms that need more than 8 registers).

Ok then, the Z80/R800 has those useful instructions called EX and EXX. With those really fast instructions (1 t-state) you can in total address eight 16-bit registers (and I read the exx-es on the Z380 were really slow, 3 t-states? then you'd better not use them often). Since you already said that that was the max register usage... Also from my own experience, only few routines need more registers than I have available, and those are usually not the critical ones inside loops, which matter most cuz they get repeated often. So in effect the additional registers are easy, probably offer a tiny bit faster code, but no significant advantage, speed-wise.

The point here was: R800 == 8 bit. Not: R800 is not RISC. (But it is not, of course).
But I really would like to know: What makes the R800 risc-based? Why is the z380 more cisc-based?

I don't agree. R800 is definately 16-bit, it's the ALU's reach that matters, *not* the external interface, and the ALU handles everything on a 16-bit level. If it didn't, there wouldn't even be a single base of reasoning to claim it's a 16-bit processor (which they did). Ofcourse you can verify this with the original R800 design schematics if you want. (blah). As I said, the external interface doesn't matter shit, it's just the means in which it is connected to the 'outside world'. And if you need an external 16-bit bus that much, it also has a 16-bit external interface. When using OUT (C),r, it outputs the value of C to address lines 8-15, B to 0-7 and r to data lines 0-7. So basically, if you use the common 8-bit I/O addressing space, you can add the value of B to the 7 data lines, and get a 16 bit bus. It's just a matter of wiring and programming things a little differently.

Furthermore, I myself don't know all the details, but I do know that Guyver has very extensive experience with both the Z380 and the R800, being that he programmed the Gameboy Emulator for MSX (GEM) and the Z380 equivalent GEMZ (one of the few actual Z380 applications existing for MSX! And definately the most advanced one), which makes it a perfectly well fit for comparison, and him the perfect candidate for doing so. This is also one of the numerous examples where the code can't detach itself from the 8-bits. The numbers I heard from Guyver were that GEM on an R800 currently runs almost as fast as GEMZ on the Z380. It can still improve and optimizations can still be made, however it is a long way to go from 'just as fast at double clock speed' to 'just as fast at same clock speed'.

Also, don't forget that on the turboR, really a lot of instructions can be executed in 1 T-state. The minimum instruction time on the Z380 is 2 T-states (which is also often met). That fact alone already almost nullifies the entire '16-bit effect', which usually basically offers a shorthand for 2-instruction R800 equivalents. Then take into account that if say, your code is 50% 16-bit (and that's a lot) and 50% 8-bit, that with the common instructions the R800 is 25% faster. This might be compensated with the real power Z380 instructions a bit, but it doesn't happen too often, and as Guyver said, only with extensive use of the multiplication and division instructions it is a realistic scenario that the Z380 is faster.

About RISC, I don't know the exact details about it, but afaik the ARM is one of the ultimate examples of a RISC processor, ne? Or at least a very cool, commonly used one. Ok, I know the ARM :). Does the THUMB mode of the ARM processor make it any less RISC? Just think of the R800 as an ARM continuously running in THUMB mode ^_^.

~Grauw

By Grauw

Enlighted (7968)

Grauw's picture

28-04-2003, 05:04

One more thing:

- has no microcode (check!) [OK, I did not know that. But are you sure? See instructions like LDIR?]

Well, I don't know what microcodes are, but if LDI hasn't got it, LDIR definately hasn't. All LDIR adds to LDI is conditionally (based on the P/V flag) increase the program counter with 2.

LDI does:
(de)=(hl)
hl++
de--
bc-- (sets P/V flag if not 0)
pc=pc+2

LDIR does:
(de)=(hl)
hl++
de--
bc--
if not(P/V) then pc=pc+2

Doesn't add much functionality of which I'd say 'that doesn't make it RISC'... ARM can execute almost every instruction conditionally, so I highly doubt that conditional PC register increase matters...

~Grauw

By sjoerd

Hero (593)

sjoerd's picture

28-04-2003, 13:31

LOL!
Ok then, the Z80/R800 has those useful instructions called EX and EXX. With those really fast instructions (1 t-state) you can in total address eight 16-bit registers (and I read the exx-es on the Z380 were really slow, 3 t-states? then you'd better not use them often).Still faster than push/pop/ld from memory.

Since you already said that that was the max register usage... Also from my own experience, only few routines need more registers than I have available, and those are usually not the critical ones inside loops, which matter most cuz they get repeated often. So in effect the additional registers are easy, probably offer a tiny bit faster code, but no significant advantage, speed-wise.Somehow I always need more registers... And that '8 reg max' is based on a old bullshit study.

R800 is definately 16-bit, it's the ALU's reach that matters, *not* the external interface, and the ALU handles everything on a 16-bit level. If it didn't, there wouldn't even be a single base of reasoning to claim it's a 16-bit processor (which they did).I do not care about alu-sizes, external interfaces or whatever. Everything about r800 feels 8 bit. As I said: I see why GuyveR800 (and many others, ofcourse) says R800 is 16 bit. I remember Sega going: 'Dreamcast is 128-bit.' Right.

About RISC, I don't know the exact details about it, but afaik the ARM is one of the ultimate examples of a RISC processor, ne?Far from it.

Or at least a very cool, commonly used one. Ok, I know the ARM Smile.8)

Does the THUMB mode of the ARM processor make it any less RISC? Just think of the R800 as an ARM continuously running in THUMB mode ^_^.Of course thumb does make it less risc. And all that shifting doesn't help either. Having said that, I like ARM. It is a nice example of a combination of risc and cisc.

Page 3/5
1 | 2 | | 4 | 5
My MSX profile