Limitations with MSX-Music

Page 5/5
1 | 2 | 3 | 4 |

By Grauw

Ascended (8931)

Grauw's picture

31-12-2019, 13:34

sdsnatcher73 wrote:

On a side note, what card/device do you use to capture the MSX audio and video?

I use an Elgato Cam Link 4K, which works well with my OSSC SCART->HDMI converter and my Mac.

alexito wrote:

Good job. Grauw. Very good achievement now we can create better demos or games. This technique on Turbo R can give us until two PCM channels if H-INT and the TIMER is used. I'm right?

With this technique you can have 9 PCM channels in theory. The frequency is limited by the YM2413, if you write both address and data it requires 12 + 84 cycles wait per channel. That makes the maximum sample frequency 37287 Hz / number of channels. For a single channel the maximum frequency is 42614 Hz because you can omit changing the register address.

I use the horizontal blank flag (HR) for timing. It is not possible to use the horizontal blank interrupt because it can not fire on every line. On turboR indeed you could also use either the system timer or the PCM timer, and use its PCM DAC in combination. In all cases it requires 100% CPU because it needs to constantly poll the timer. However with a single PCM channel there’s plenty of headroom to do other things such as that waveform effect.

By Grauw

Ascended (8931)

Grauw's picture

02-01-2020, 23:07

I’ve done one more experiment to attempt to increase the resolution of the DAC.

I figured I could configure the modulator to increase the number of distinct amplitudes produced. I picked the following setting: 20 2F 1D 00 FF FF 0F 0F. This makes the carrier use 15/16th of a full sine wave, and then adds a 1/32th amount of phase modulation to push it into outputting the full range.

This however resulted in quite a noisy output, so I wondered if the model that I used to predict the amplitudes was incorrect. I asked the help of Wouter Vermaelen to do some hardware measurements, since he has a good test set-up for his YM2413 research project.

After a small amendment I arrived at this model which matched the measurements exactly:

for (uint32_t fnum = 0; fnum < 256; fnum++) {
    uint32_t phase = fnum << block;
    auto sinM = lookupSin(phase * multM >> 10);
    auto modulator = lookupExp(sinM + 16 * 2 * tl) >> 1;
    auto sinC = lookupSin((phase * multC >> 10) + (enableM ? 2 * modulator : 0));
    auto carrier = lookupExp(sinC + 16 * 8 * volume) >> 4;
    buffer[fnum] = carrier;
}

The lookupSin and lookupExp functions come from Wouter’s research.

Unfortunately, the output is still noisy, as can be heard in this test. Luckily Wouter’s measurements already showed why this would be. He took 16 samples after each f-num LSB change, and a cross section of the output looks like this:

  6: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35
  7: 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
  8: 47 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
  9: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55
 10: 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61

These values match the model, however why is there a 47 after setting 8? (Not a measurement error.)

The explanation is that internally the YM2413 updates one operator every 4 cycles, wrapping around after processing 18 operators for 72 cycles total. If the f-num is written exactly in-between the processing of the modulator and carrier, the carrier will use the new phase while the modulator is still on the old phase, and an incorrect amplitude is output. The modulator receives it 68 cycles later.

In other words, depending on how the timing of the CPU’s write aligns with the processing of the YM2413, a more or less arbitrary amplitude can be output, resulting in the noise we observed.

[Edit: Wouter actually thinks that the delayed update of the modulator is guaranteed to happen, a result of the moment the YM2413 processes the register write rather than the moment it receives the data. I confirm this in the post below.]

Wouter did another measurement, altering the phase more extremely:

  0: 25  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
128: 49 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
  1: 55  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
129: 44 -6 -6 -6 -6 -6 -6 -6 -6 -6 -6 -6 -6 -6 -6 -6
  2: 61 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

As you can see, changing the phase by larger amounts makes the spikes clearly visible and more pronounced. It also occurs every write rather than more occasionally. I suspect that may be due to the specific timing used in his test, but Wouter thinks it is guaranteed, and he may be right.

In conclusion, we can’t use the modulator for sample playback, and should stick to the carrier only.

By Grauw

Ascended (8931)

Grauw's picture

02-01-2020, 23:12

Grauw wrote:

The explanation is that internally the YM2413 updates one operator every 4 cycles, wrapping around after processing 18 operators for 72 cycles total. If the f-num is written exactly in-between the processing of the modulator and carrier, the carrier will use the new phase while the modulator is still on the old phase, and an incorrect amplitude is output. The modulator receives it 68 cycles later.

In other words, depending on how the timing of the CPU’s write aligns with the processing of the YM2413, a more or less arbitrary amplitude can be output, resulting in the noise we observed.

Ok, this was wrong. And Wouter was right Smile.

Nuke.YKT suggested on Twitter that I try some channel other than channel 0. Channels 0-2 play back with noise, but it turns out that when I use channels 3-8 the noise is gone.

I was wondering why this is, so I started reading the source code of Nuke.YKT’s new Nuked OPLL emulation core. He published it last week and it is very accurate, modeling the YM2413 on a very low level similar to the design of the chip. Let me summarise the processing order;

As I mentioned the YM2413 runs in a continuous cycle of 18 steps. Each step it processes one operator in a specific order which is documented in the YM2413 application manual. Each step it also processes one channel’s register write from the CPU. As follows:

  1. Channel 0 modulator + Channel 0 writes
  2. Channel 1 modulator + Channel 1 writes
  3. Channel 2 modulator + Channel 2 writes
  4. Channel 0 carrier + Channel 3 writes
  5. Channel 1 carrier + Channel 4 writes
  6. Channel 2 carrier + Channel 5 writes
  7. Channel 3 modulator + Channel 6 writes
  8. Channel 4 modulator + Channel 7 writes
  9. Channel 5 modulator + Channel 8 writes
  10. Channel 3 carrier + Channel 0 alias 9 writes
  11. Channel 4 carrier + Channel 1 alias 10 writes
  12. Channel 5 carrier + Channel 2 alias 11 writes
  13. Channel 6 modulator + Channel 3 alias 12 writes
  14. Channel 7 modulator + Channel 4 alias 13 writes
  15. Channel 8 modulator + Channel 5 alias 14 writes
  16. Channel 6 carrier + Channel 6 alias 15 writes
  17. Channel 7 carrier
  18. Channel 8 carrier

As you can see here, the modulator evaluation and writes for channels 0-2 happen in the same step, and the write is processed after the modulator is evaluated. This explains why the modulator is lagging behind the carrier in updating its phase.

For channels 3-8 the writes are processed well before their modulators and carriers, so these don’t have that issue where the phase between the two is out of sync, and we can fully control them.

And in fact, we don’t even have to rule out channels 0-2! Because the channel numbers > 8 are mirrored (but are processed later), we can write the data to the mirrored channels 9-11, and they will no longer get written in-between the modulator and carrier’s evaluation.

So new conclusion: we can use the modulator for sample playback as well, but when we want to do it on channels 0-2, we should access it through their mirrors at channel offsets 9-11.

By NYYRIKKI

Enlighted (5508)

NYYRIKKI's picture

03-01-2020, 02:00

So, now you can play up to 8 "near 8bit" PCM cannels? That is very cool! Do you need cycle accurate timing or how it works? (Some example code from inner loop would be nice)

By Grauw

Ascended (8931)

Grauw's picture

03-01-2020, 04:14

9 Smile. No cycle accurate timing needed.

After the register set-up described previously (r0-7, r48, r32 and r15), to play back a 16K block of single channel pre-mapped PCM data you do:

    ld hl,8000H
    ld a,10H
    out (7CH),a
    ld c,7DH
Loop:
    call TimerSyncWait
    outi
    bit 6,h  ; reached C000H?
    jr z,Loop

Or if you store linear PCM data and do a lookup on the fly:

    ld hl,8000H
    ld de,mappingLookupTable  ; aligned 100H
    ld a,10H
    out (7CH),a
    ld c,7DH
Loop:
    call TimerSyncWait
    ld e,(hl)
    ld a,(de)
    out (c),a
    inc hl
    bit 6,h  ; reached C000H?
    jr z,Loop

At the example rate of 15.7 kHz (most convenient high-res timer on MSX2), you have 228 cycles per loop at your disposal. As you can see, plenty of time to do whatever you want. Do remember that a YM2413 register write requires a 84 cycle wait though, so you can’t output at full speed.

By Manuel

Ascended (16451)

Manuel's picture

29-03-2020, 22:36

Wouter just integrated the first version of the Nuke.YKT OPLL core into openMSX. The latest development version should be able to correctly emulate these things Smile Please help testing and report any issues you found.

Page 5/5
1 | 2 | 3 | 4 |