Assember: 16bit bubble sort

Страница 2/5
1 | | 3 | 4 | 5

By yzi

Champion (444)

Аватар пользователя yzi

26-07-2011, 19:48

"In place" means that it doesn't allocate extra memory for temporary buffer/array.

Nobody mentioned radix sort yet. I think Amiga coders used to use that a lot. http://en.wikipedia.org/wiki/Radix_sort

Yeah, and bucket sort! That one was my favourite back in the day. http://en.wikipedia.org/wiki/Bucket_sort

By NYYRIKKI

Enlighted (5897)

Аватар пользователя NYYRIKKI

26-07-2011, 20:16

Similar speed optimization for 16bit version...

	;Input BC = Number of words to sort (Min 2)
	LD BC,(DATAE-DATAS)/2

INSERT16:
	DI
	LD HL,DATAS	; Data to be sorted out
	LD (.STORSP),SP
	LD IX,0
	DEC BC
	ADD HL,BC
	ADD HL,BC
	EXX
.LOOPO:
	INC IX
	LD B,IXH
	LD C,IXL
	EXX
	CPD
	JP PO,.EXIT
	DEC HL
	LD SP,HL
	EXX
	POP HL
.LOOPI:
	POP DE
	LD A,D
	CP H
	JR C,.SWAP
	LD A,E
	CP L
	JR NC,.LOOPO
.SWAP:
	PUSH HL
	PUSH DE
	CPI
	POP DE
	POP HL
	JP PE,.LOOPI
	JP .LOOPO
.EXIT:
	LD SP,0
.STORSP: EQU $-2
	RET

; Example data:

DATAS:	DW 0,9,2,7,6,2,8,1,1,4
DATAE:

By NYYRIKKI

Enlighted (5897)

Аватар пользователя NYYRIKKI

26-07-2011, 21:26

Never heard of counting sort, at least not under this name.

I think he means something like this:

>>,[>>,]<<[[-<+<]>[>[>>]<[.[-]<[[>>+<<-]<]>>]>]<<]

(This example of sort algorithm is made in BrainFuck)

By WORP3

Paladin (835)

Аватар пользователя WORP3

27-07-2011, 10:36

Yet another error in 8bit version... I did forget to POP BC on exit.

Ok, now here is already pretty well optimized 8bit insertion sort:

(I don't know why I do this to my self... The very first version already did all that I needed but now I'm stuck in optimizing these stupid routines)

Hahaha, it's all for a good cause, and you are doing a fine job too Wink

By pitpan

Prophet (3145)

Аватар пользователя pitpan

27-07-2011, 10:42

Love BF!

By PingPong

Prophet (3833)

Аватар пользователя PingPong

27-07-2011, 15:39

QuickSort? but is more complex and one should select the right pivot ...

By enribar

Paragon (1155)

Аватар пользователя enribar

27-07-2011, 17:45

hehehe insertion_sort, bubble_sort, merge_sort, quick_sort, radix_sort, counting_sort, bucket_sort, shell_sort...
Time and/or space complexity, best-case, worst-case, average-case, stable and not stable algorithms, in-place algs...
The good and "brainfuck" university old days Hannibal

I have an idea: instead of spending time on sorting algorithms, why we can't spend time in B Trees or B+ Trees data structures?
I remember a very good basic book to start building a real DBMS for MSX, "Foundamentals of database systems" - Elmasri, Navathe.

By wouter_

Champion (470)

Аватар пользователя wouter_

27-07-2011, 18:59

Ok, now here is already pretty well optimized 8bit insertion sort:

	;Input BC = Number of bytes to sort (Min 2)
	LD BC,DATAE-DATAS

INSERT8:
	LD HL,DATAS ; Data to be sorted out
	ADD HL,BC
	EXX
	LD IX,0
	DB #3E
.NextSort:
	LD (DE),A
	EXX
	CPD
	JP PO,.EXIT
	PUSH HL
	EXX
	POP HL
	INC IX
	LD B,IXH
	LD C,IXL
	LD D,H
	LD E,L
	DEC DE
	LD A,(DE)
	CP (HL)
	JR NC,.NextSort+1
.FAST:	LDI
	JP PO,.NextSort
	CP (HL)
	JP C,.FAST
	JP .NextSort

.EXIT	RET

; Example data:

DATAS:	DB 0,9,2,7,6,2,8,1,1,4
DATAE:


Great work NYYRIKKI. I took your routine as a starting point and optimized it some more. Though I had to add some (minor) restrictions.

First I assume it's OK to put a sentinel value at the end of the buffer (add an extra byte with value 0 right after the to-be-sorted buffer). This allows to remove the end-of-buffer test (JP PO,.NextSort) from the fast-loop. Now the value of register BC in that loop doesn't matter anymore. And that in turn allows to remove quite a few instructions (all instructions that use the IX register).

Because of the simpler control flow in the fast-loop it's now also possible to get rid of your 'db #3E'-hack (to skip the next 1-byte instruction).

(detail) The instruction 'JP PO,.EXIT' can be replaced with 'RET PO'.

In the next step I want to get rid of the BC register in the outer loop. This register is (only) used to detect when we're finished sorting. But if we remove BC we need an alternative way to detect when we're done. I choose to align the buffer at a 256-byte boundary and add a restriction that the buffer size must be smaller than 256 bytes (I think this is a very reasonable restriction ... if your buffer is bigger than 256 bytes, the insertion-sort algorithm will anyway be too slow). With these constraints we can detect the start of the buffer by comparing the low-byte of the address to zero.

This second step enables another chain of simplifications. I'll only show the final result:

Sort8:	LD   HL,BufE     ; 1 past end of buffer

	LD   D,H         ;
next:	LD   E,L         ;
	DEC  E           ; DE = HL - 1
	
	; invariants:
	;   DE == HL - 1
	;   sub-range [DE .. BufE-1] is sorted
loop1:  DEC  L           ;
	RET  Z           ; all sorted?
	DEC  E           ; extend sub-range
	LD   A,(DE)      ; A = new element
	CP   (HL)        ; 
	JR   NC,loop1    ; extended subrange still sorted?

	; not sorted anymore .. insert new element at correct position
	PUSH HL          ;
loop2:	LDI              ; (DE) <- (HL) ; ++HL ; ++DE ; --BC 
	CP   (HL)        ; because of the sentinel value, this will never run
	JP   C,loop2     ;   past the end of the buffer
	LD   (DE),A      ; store new element at correct position
	POP  HL          ;
	JP   next        ;

	.align 256
BufS:	db 0,9,2,7,6,2,8,1,1,4
BufE:	db 0             ; sentinel value, must be 0

Ok, one more minor tweak. The PUSH/POP instructions are relatively expensive. It's possible to gain 5 cycles with this hack (the maximum buffer size is now 255 (not 256)).

	...
	; not sorted anymore .. insert new element at correct position
	LD   B,L         ; save L for later
	LD   C,255       ; this makes sure register B won't change in the loop below
loop2:	LDI              ; (DE) <- (HL) ; ++HL ; ++DE ; --BC
	CP   (HL)        ; because of the sentinel value, this will never run
	JP   C,loop2     ;   past the end of the buffer
	LD   (DE),A      ; store new element at correct position
	LD   L,B         ; restore L
	JP   next        ;
	...

Some ideas to tweak this code even more: at the label 'next' we copy register L to E and then decrement both L and E. Maybe this order can be swapped? Or maybe only on the code path that goes via the last 'JP Next' instruction? If (right before the loop2 label) register C is initialized to 0 instead of 255, B is decremented for free. Though I don't yet see how these ideas can be integrated with the end-of-routine test (RET Z instruction). NYYRIKKI, any ideas?

By wouter_

Champion (470)

Аватар пользователя wouter_

27-07-2011, 20:33

Ok, I managed to improve the routine a little more by adding a 2nd sentinel at the beginning of the buffer. This routine is faster than the previous one (also) because the begin-of-buffer-reached check is skipped for most iterations (the JR NC,loop1 jump is most often not taken, that's also why it's a JR instead of JP).

Sort8:	LD   HL,BufE-1   ; last element
	LD   D,H         ;
	LD   E,L         ;

	db   #01         ; LD BC,nn .. skip next 2 instructions
loop1:  DEC  L           ;
	RET  Z           ; all sorted?
next:	DEC  E           ; extend sub-range
	LD   A,(DE)      ; A = new element
	CP   (HL)        ; 
	JR   NC,loop1    ; extended subrange still sorted?

	LD   B,L         ; save L for later
	LD   C,0         ; this makes sure B decrements by exactly 1 in the next loop
loop2:	LDI              ; (DE) <- (HL) ; ++HL ; ++DE ; --BC
	CP   (HL)        ; because of the sentinel value, this will never run
	JP   C,loop2     ;   past the end of the buffer
	LD   (DE),A      ; store new element at correct position
	LD   L,B         ; restore L (minus 1)
	LD   E,B         ;
	JP   next        ;

	.align 256
	db 255           ; sentinel value, must be 255
BufS:	db 0,9,2,7,6,2,8,1,1,4
BufE:	db 0             ; sentinel value, must be 0

By WORP3

Paladin (835)

Аватар пользователя WORP3

27-07-2011, 22:19

Wow i never could have guess that my remark on the bubble sort had such an impact. It seems that everyone is thinking on improving the sorting routine Big smile

Great work everyone, their are a lot of nice routines coming up !

@wouter, can you explain the db #01 and the remark that it would be a LD BC,nn ? Where is the nn defined or is it really just to skip the dec l and the ret z, but then why not use just a jp as a jp has the same execution time (10 cycles), or am i missing something ?

Never ever did take a look at it but it seems that a JR actually is taking more cycles then a JP, that's strange ?

Страница 2/5
1 | | 3 | 4 | 5