|
| | There are 159 guests and 2 MSX friends online
You are an anonymous user.
|
| |
Author
| Crystal clean PCM 8bit samples on the poor PSG
| ARTRAG msx master Posts: 1563 | Posted: December 29 2005, 13:09   | Ok this is time to share a nice routine for playing 8bit PCM samples on the PSG without the ugly spikes that are generated by the previous conventional techniques that play sample by sample each PCM sample using a 3 channels table.
Go to http://map.tni.nl/articles/psg_sample.php for details on the “standard” solution.
In a word, the standard solution associates each 8bit sample to a triplet of channel levels suitably chosen in order to minimize the error of the PSG overall output level with respect to the given sample. As the sum of the 3 channels leads to 608 disjoint values, you would expect a the output quality of a 9bit quantizer, but, unfortunately, the resulting levels are oddly spaced, so the theorical quality corresponds in dB to about 48dB of SNR, which is turn is equivalent to a 8bit uniform quantizer.
Problem:
The “standard” solution has a very annoying problem: the z80 cannot change the 3 PSG channels at the same time, this implies that, during the transition between two successive samples, the PSG passes trough undefined states where some channels are from the previous triplet and some channels are from the triplet corresponding to the next sample.
The result is that, at each transition between two successive samples, the PSG can produce 1 or 2 spikes of short duration. This effect, repeated at each sample transition, becomes a very annoying wideband noise that sensibly reduce the overall quality of the output, far below the optimal 48dB of SNR.
Solution:
The sole solution is to control the PSG trough every transition.
As the z80 can change a single channel at time and each channel has 16 values, the PSG can be modelled at any time as a state machine with 256 states, 16 inputs, and 608 outputs.
The 256 states are the 16*16 values of the old channels (those that don’t change at the current time), the 16 inputs are the allowed values of the current channel, the 608 outputs are the sum of the volume values from the “state” and of the volume value from the current input.
Each sample transition corresponds to three steps of the state machine.
Assume ABC the current values playing and A’B’C’ the new values.
The transitions are:
ABC -> ABC’
ABC’-> AB’C’
AB’C’-> A’B’C’
So, between the level ABC corresponding to the previous sample and the level A’B’C’ corresponding to the next sample, we get :
ABC
ABC’
AB’C’
A’B’C’
Assume x be the sequence of input samples, we want to minimize the mean square error between the x and the output sequence of levels in any single transition, this implies we need to develop a search algorithm that explores all the sequences of state transitions of the PSG corresponding to the allowed outputs, and finds the best sequence of channel levels that minimize the mean square error.
The algorithm is already existing, and is the classical Viterbi algorithm, I do not run into details here, but in case I will go back on this classical topic.
The output of such an algorithm is the sequence of levels for each channel that encodes the input sequence x.
Now the new problem: each triplet is 4*3= 12 bit : a huge waste of space!
Solution run length encoding.
The best solution should be a RLE based on nibbles, as some "detail channels" can have very long bursts of non equal symbols. Due to lack of time and of energy I did as you can see in the following.
I know that the implementation could be improved, but I cannot fit any a better solution in 447 t cycles (8KHz of PCM player).
Enjoy
| | ARTRAG msx master Posts: 1563 | Posted: December 29 2005, 13:10   |
Fs equ 8192 ; alternative 8000 or 8192
;------------------
MACRO PsgW reg,value
ld a,reg ; select AY channel A volume register ; 7+1
out (#A0),a ; 11+1
ld a,value ; 7+1
out (#A1),a ; and write back to the AY ; 11+1
ENDM ; total 36+4 = 40T
;------------------
MACRO WaitXX
if (Fs==8000)
.3 nop ; 3*5 = 15
else
if (Fs==8192)
nop ; 5
else
assert 0
endif
endif
ENDM ; total 15 or 5
MACRO Wait68
.2 ex (sp),hl ; 2*20 =40
.4 nop ; 4*5 = 20
cp 0 ; 8
ENDM ; total 68
;------------------
MACRO next reg
ld a,(hl) ; 8
.4 rrca ; 4*5
and 15 ; 8
ld reg,a ; 5
ld a,(hl) ; 8
and 15 ; 8
exx ; 5
ld reg,a ; 5
exx ; 5
inc hl ; 7
ENDM ; tot = 79
;------------------
OUTPUT wavplay.com
ORG 100h
START:
di
PsgW 0,255
PsgW 1,255
PsgW 2,255
PsgW 3,255
PsgW 4,255
PsgW 5,255
PsgW 6,255
PsgW 7,10111111B
;-------------------------------------
; IN HL - Encoded sample start address
; HL'- Sample length (#pcm saples)
;-------------------------------------
ld hl, SAMPLE+2
ld bc,#0101
ld de,bc
exx
ld hl, (SAMPLE)
ld c,#A1
.LOOP:
exx ; 5
dec b ; 5
ld a,15 ; 8
and b ; 5
jp nz,waitA ; 11
next b ; 79
1 dec d ; 5
ld a,15 ; 8
and d ; 5
jp nz,waitB ; 11
next d ; 79
2 dec e ; 5
ld a,15 ; 8
and e ; 5
jp nz,waitC ; 11
next e ; 79
3 exx ; 5
ld a,8 ; 8
out (#a0),a ; 12
inc a ; 5
out (c),b ; 14
out (#a0),a ; 12
inc a ; 5
out (c),d ; 14
out (#a0),a ; 12
out (c),e ; 14
WaitXX ; 15 or 5
dec hl ; 7
ld a,h ; 5
or l ; 5
jp nz,.LOOP ; 11
; tot = 447T or 437T
ret
waitA Wait68 ; 68
jp 1B ; 11
; tot = 79
waitB Wait68 ; 68
jp 2B ; 11
; tot = 79
waitC Wait68 ; 68
jp 3B ; 11
; tot = 79
; b7b6b5b4|b3b2b1b0
; NumTick |LevVolum
SAMPLE:
include "C:\MATLAB6p5\work\out.txt"
FINISH:
| | ARTRAG msx master Posts: 1563 | Posted: December 29 2005, 13:14   | To whom is interested in going deeper in the way the Viterbi optimization is performed
[y,Fs,nbit,opt]=wavread('bluetooth_input.wav');
%load handel
y = (y-min(y)) /(max(y)-min(y)) * 1.15; % input tra 0 e 1.15
f = 3579545;
fpcm = Fs;
tp = f/fpcm; % 447 = 8000Hz, 437 = 8192Hz
dt1 = (12+5+14)/tp;
dt2 = (12+14)/tp;
dt3 = 1-dt1-dt2;
N = 3*length(y);
x = zeros(1,N);
x(1:3:end) = y;
x(2:3:end-3) = y(1:end-1)*(1-dt1)+y(2:end)*dt1;
x(3:3:end-3) = y(1:end-1)*(1-dt1-dt2)+y(2:end)*(dt1+dt2);
x(end-2:end)= y(end);
n=[0:15]';
vol=2.^(n/2)/2^7.5;
vol(1)=0;
nxtS = uint8(zeros(256,16));
for i=0:15
for j=0:15
for in=0:15
nxtS(i*16+j+1,in+1) = uint8(j*16 + in);
end
end
end
curV = zeros(256,16);
for i=0:15
for j=0:15
for in=0:15
curV(i*16+j+1,in+1) = vol(i+1)+vol(j+1)+vol(in+1);
end
end
end
Stt = uint8(zeros(256,N));
Itt = uint8(zeros(256,N));
L = zeros(256,1);
St = uint8(ones(256,1));
It = uint8(ones(256,1));
for t =1:N
Ln = inf*ones(256,1);
if mod(t-1,1000)==0
proc = fix(t/N *100)
end
for cs=0:255
for in=0:15
cv = curV(cs+1,in+1);
ns = double(nxtS(cs+1,in+1));
switch mod(t-1,3)
case 0
Ltst = L(cs+1)+dt1*abs(x(t)-cv)^2;
case 1
Ltst = L(cs+1)+dt2*abs(x(t)-cv)^2;
case 2
Ltst = L(cs+1)+dt3*abs(x(t)-cv)^2;
end
if Ln(ns+1) >= Ltst
Ln(ns+1) = Ltst;
St(ns+1) = cs;
It(ns+1) = in;
end
end
end
L = Ln;
Stt(:,t) = St;
Itt(:,t) = It;
end
[l,i] = min(L);
P = uint8(zeros(1,N));
I = uint8(zeros(1,N));
P(N) = Stt(i,N);
I(N) = Itt(i,N);
for t = (N-1):-1:1
P(t) = Stt(double(P(t+1))+1,t);
I(t) = Itt(double(P(t+1))+1,t);
end
V = zeros(1,N);
for t = 1:N
V(t) = curV(double(P(t))+1,double(I(t))+1);
end
er = sum(abs(x(1:3:end)-V(1:3:end)).^2)*dt1+...
sum(abs(x(2:3:end)-V(2:3:end)).^2)*dt2+...
sum(abs(x(3:3:end)-V(3:3:end)).^2)*dt3
en = sum(abs(x(1:3:end)).^2)*dt1+...
sum(abs(x(2:3:end)).^2)*dt2+...
sum(abs(x(3:3:end)).^2)*dt3
disp('SNR db=')
SNR = 10*log10(en/er)
j=1;
A = 0;
B = 0;
C = 0;
la = 1;
lb = 1;
lc = 1;
ja=1;
jb=2;
jc=2;
s = [];
for i=1: length (I)
switch mod(i-1,3)
case 0
if A==I(i)
la=la+1;
if la==17
s(ja)=16*16+double(A);
la=1;
ja=j;
j=j+1;
end
else
s(ja)=16*la+double(A);
la=1;
ja=j;
j=j+1;
A=I(i);
end
case 1
if B==I(i)
lb=lb+1;
if lb==17
s(jb)=16*16+double(B);
lb=1;
jb=j;
j=j+1;
end
else
s(jb)=16*lb+double(B);
lb=1;
jb=j;
j=j+1;
B=I(i);
end
case 2
if C==I(i)
lc=lc+1;
if lc==17
s(jc)=16*16+double(C);
lc=1;
jc=j;
j=j+1;
end
else
s(jc)=16*lc+double(C);
lc=1;
jc=j;
j=j+1;
C=I(i);
end
end
end
disp('actual len=')
disp(length(s))
I0 = I(1:3:end);
I1 = I(2:3:end);
I2 = I(3:3:end);
i0 = find(diff(double(I0)));
str0=[];
for j=0:(length(i0)-1)
if j==0
t = I0(1:i0(j+1));
else
t = I0(i0(j)+1:i0(j+1));
end
str0(j+1,1) = length(t);
str0(j+1,2) = I0(i0(j+1));
end
i1 = find(diff(double(I1)));
str1=[];
for j=0:(length(i1)-1)
if j==0
t = I1(1:i1(j+1));
else
t = I1(i1(j)+1:i1(j+1));
end
str1(j+1,1) = length(t);
str1(j+1,2) = I1(i1(j+1));
end
i2 = find(diff(double(I2)));
str2=[];
for j=0:(length(i2)-1)
if j==0
t = I2(1:i2(j+1));
else
t = I2(i2(j)+1:i2(j+1));
end
str2(j+1,1) = length(t);
str2(j+1,2) = I2(i2(j+1));
end
disp('best len=')
disp(length(str0)+length(str1)+length(str2))
disp ('ratio =')
disp(length(s)/(length(str0)+length(str1)+length(str2)))
% i=20 ; [ bitand(s(1:i)',240)/16 bitand(s(1:i)',15) ]
% [ str0(1:10,:) str1(1:10,:) str2(1:10,:) ]
t = zeros(N,1);
t(1:3:end) = 0 + [1:N/3];
t(2:3:end) = dt1 + [1:N/3];
t(3:3:end) = dt1 + dt2 + [1:N/3];
figure(1)
subplot(2,1,1)
plot(t,V,t,x,'r',t,x,'.g')
subplot(2,1,2)
plot(t,V-x)
np = fix(sum(fix(s/16))/3)-6;
[fid, MESSAGE] = fopen('out.txt','W');
fprintf(fid,' dw %6d\n',np);
fprintf(fid,' db %3d\n',s);
fclose(fid);
% ASM replayer simulator
Vp=[];t=1;
ChA=0;ChB=0;ChC=0;
b=1;d=1;e=1;
hl=1;
hlp = np;
while (hlp>0)
b=b-1;
if b==0
b = fix(s(hl)/16);
bp = bitand(s(hl),15);
hl=hl+1;
end
d=d-1;
if d==0
d = fix(s(hl)/16);
dp = bitand(s(hl),15);
hl=hl+1;
end
e=e-1;
if e==0
e = fix(s(hl)/16);
ep = bitand(s(hl),15);
hl=hl+1;
end
ChA = bp;
Vp(t) = vol(ChA+1)+vol(ChB+1)+vol(ChC+1); t=t+1;
ChB = dp;
Vp(t) = vol(ChA+1)+vol(ChB+1)+vol(ChC+1); t=t+1;
ChC = ep;
Vp(t) = vol(ChA+1)+vol(ChB+1)+vol(ChC+1); t=t+1;
hlp = hlp-1;
end
figure(2)
plot(V)
hold on;
plot(Vp,'.r')
| | ro msx guru Posts: 2287 | Posted: December 29 2005, 13:27   | kewl, I think. 'ave to test it. but sound kewl. thanx Atrag
| | ARTRAG msx master Posts: 1563 | Posted: December 29 2005, 13:33   | I have sent two small examples at 'downloads@msx.org'
and I am preparing some others short encodedd wavs
| | ro msx guru Posts: 2287 | Posted: December 30 2005, 08:13   | looking forward 2 it!
| | ARTRAG msx master Posts: 1563 | Posted: December 31 2005, 16:40   | I have done a good number of tests and I have found that the main problem
of this approach is the size of the data.
My RLE imlementation is very poor as my data (at least some channels) can
be very irregular.
Sometimes the result, wrt an 8bit input, is 1.6 times the input, i.e. it is worst than
not encoding at all, that results 1.5 the original lenght....
I have found also that if I trade off SNR with # transition the size can drastically
decrease to 0.9 times the original sive but the SND tends to reduce of 2-3dB.
I will post soon the new algorithm for trading size with SNR.
| | snout
 msx legend Posts: 4991 | Posted: December 31 2005, 21:22   | Sorry for the delay on putting the files online, it have been extremely busy days... as soon as I recover from my new-years hangover, I'll put 'em in our downloads section!  | | ARTRAG msx master Posts: 1563 | Posted: January 01 2006, 05:55   | Take your time.
With jltursan we discovered that the msdos pcmenc.exe could not work on all the PC
due to the need of a .dll from matlab.
Probably (also for increasing the processing speed) it would be required a new version
of pcmenc.exe developed directly in C (I compiled the matlab program).
| | dvik msx master Posts: 1299 | Posted: January 01 2006, 10:21   | Hi ARTRAG,
Is the ideas in your new PSG sample player something totally new or a refinement of what was discussed in http://www.msx.org/forumtopic663p105.html ? It sounds like its somewhat new in the sense you time it quite well and then using an adaptive way of changing samples. I'm really looking forward to hear the result. | | ARTRAG msx master Posts: 1563 | Posted: January 01 2006, 22:56   | This is a further development more that a refinement.
The approach is completely different from the one we developed (that one was a "sample by sample" encoding,
this one is a "sequence" encoding) but I have planned this solution (the fact that the sequence of PSG levels can
be found with a Viterbi algorithm) when we discovered the "spike" problem.
If you are interested send me an email and I'll send you some stuff to have a
taste of the whole thing.
| | dvik msx master Posts: 1299 | Posted: January 01 2006, 23:06   | Yeah, I remember we talked about having a more dynamic model than just a static table but iirc we didn't get any better results back then. I remember you talked about Viterbi back then too (dont think it was tested though) so I figured your new algorithm is doing something different. If you send a test version I'll try it both in emus and on real MSXes.
My email is: daniel AT vik DOT cc
Do you have any guess on how sensitive it is to timing, i.e. is it good enough if the average frequency is 8kHz or does each individual sample need to be spaced exactly 1/8000 seconds? The reason why I wonder this is if it would be possible to use it in a demo like waves.
Another thing that would be very cool is to write a mod player using this engine. Not sure if its possible but it would be fun to give it a try.
| | ARTRAG msx master Posts: 1563 | Posted: January 01 2006, 23:22   | Package sent!
| | dvik msx master Posts: 1299 | Posted: January 01 2006, 23:27   | Great! Thanks  | | ARTRAG msx master Posts: 1563 | Posted: January 01 2006, 23:31   | The algorithm has a heavy encoding part to be performed with a PC and a light z80 player
that uses RLE and does playback at 8KHz.
Actually unrolling the replayer in 8 branches, you can get on statistic basis "a lot" of spare time without
the need to jitter the sample time.
I was thinking to use the spare time for implementing a better encoding, now the principal problem is the data size.
Each sample needs 3*4 levels, without encoding you get an expansion of 1,5 times (encoding PCM at 8bit).
My RLE implementation is poor, unique values waste a full byte (instead of a nibble), so in average the compression
can be 1.2-1.6 (Yes even worst than without enconding)
This is the reason why i am working to a system for trading quality and higher compression (on the basis of a tunable parameter).
Now the quality ranges from 42dB to 36dB of SNR
while the encoding ratio wrt the original file can vary. If you start from 8bit PCM in the worst
case the RLE expand the result to 1.5 -1.6 times.
(consistent with the fact I encode PSG levels (3x4bit levels) and with the fact that my RLE implementation is very poor.)
With the new solution I am working on, acting on an optimization parameter, I can reduce the SNR increasing the compression.
In this case you can get also a reduction of the size having a final ratio of 0.8 or 0.9 and a
lower SNR of about 35dB.
| |
| |
| |
| |