Voice synthesis on ISR

Page 31/31
24 | 25 | 26 | 27 | 28 | 29 | 30 |

Par Grauw

Ascended (10063)

Portrait de Grauw

07-03-2021, 14:26

I see, interesting…

I did some experiment to see exactly what could be going on; I listened to sccLOFI_1c-3.rom in openMSX with setting set speed 10, which does not (currently) affect the sound chip clock speed so it is convenient to listen to exactly what’s going on. Additionally, I looked at the waveform with toggle_scc_viewer (modified to update every frame).

Effect 6 has some warble in it, so I tried with that. When the warble occurs I hear a lot more overtones, and in the scc viewer I see the waveform change to a 2nd order wave…

Video of the experiment here.

Par ARTRAG

Enlighted (6543)

Portrait de ARTRAG

07-03-2021, 18:43

I see what you mean and I think that the frames you have spotted are "unvoiced" segments that the algorithm for pitch estimation has failed to catch. In those regions, usually noisy, I use the highest frequency in the spectrum to approximate the sound. I know it is a very bad strategy, but I wasn't able to think anything better with only 32 samples.
BTW, the noise you refer to is much more continuous, and I think it is not related to that single frames

Par Grauw

Ascended (10063)

Portrait de Grauw

07-03-2021, 19:48

ARTRAG wrote:

BTW, the noise you refer to is much more continuous, and I think it is not related to that single frames

Hmm yes you’re right. But if it is due to discontinuities between frames, I would expect the noise to be pitched 60 Hz, and this is much higher… Also the waveforms shouldn’t change significantly on a frame-by-frame basis, so in principle just a change in tonal character shouldn’t cause a big discontinuity.

In Synthesix I had this issue that the SCC resets the waveform phase when the frequency is set, I avoided it by only setting the frequency when it changes. During pitch bends it wasn’t too noticeable. An easy test would be to check if the warble disappears with a fixed frequency…

Some of it could also be coming from the source material. If I listen to this clip on Youtube where the vocals have been isolated, I do hear some odd harmonics, more subtle, but maybe they are amplified by the algorithm.

Oh, but I just checked WYZ’s True Survivor ROM posted on page 30 with openMSX, and it’s using the 5-channel technique. That was probably processed differently. The samples of your sccLOFI_1c-3.rom with the newer 1-channel conversion are sounding pretty clean overall.

Par ARTRAG

Enlighted (6543)

Portrait de ARTRAG

23-03-2021, 18:51

Here there is a new version of the standalone encoder
https://github.com/cornelisser/TriloTracker/issues/146

I've added more parameters aimed to encode instruments and non vocal sounds.
On the command line you can use:
tnn
where nn is a two digit integer in 0-99 that allows you to change the threshold used to switch the processing between voiced segments to unvoiced segments. It has the meaning of a probability in 0,00 - 0,99
nn=00 means that all the sample is processed as voiced using the estimated pitch
nn=99 means that almost all the sample is processed as unvoiced using the frequency peak as base frequency
By default the threshold now is 0 (earlier it was 0,05)

You can also use:
gmmmmm
Where mmmmm is a 5 digits number representing the SCC period.
Pay attention to the number of digits: they have to be 5.
For example with mmmmm=01696 you get note C2.
This parameter is used to force the pitch used in the waveform approximation to a known values.
It is useful for sampling instruments where the note played by the sample is known.
By default the pitch is estimated by the frame.

Eg. try -p60t05g01696 to get:

NTSC frames (p60)
unvoiced processing if probability < 0.05 (t05)
Pitch forced at 3579545/(period+1)/32 Hz (where period is the SCC period = 1696 ) (g01696)

Note: the parameters go without spaces. The parser is very limited but it is case insensitive (i.e. P60 and p60 are the same).

Page 31/31
24 | 25 | 26 | 27 | 28 | 29 | 30 |