Japanese text encoding standard for MSX?

صفحة 1/3
| 2 | 3

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

31-01-2018, 06:43

Hey guys,

Some of you may have seen my other topic where I mentioned a freeze bug in M-Kai's 1996 Japanese RPG "Izumic Ballade." Well, since the game was programmed entirely in MSX-BASIC, I've decided to take matters into my own hands, converting all the .bas files on the disk to plain text (via the save "filename.bas",a command) and loading them back onto my Windows PC so I can dig through them and try to figure out exactly where in the code the freeze is occurring, then hopefully repair that problem so I can continue playing without the constant fear of losing progress due to a sudden hang during battle.

While I'm at it, too, I've been toying with the idea of just... translating it. Because why not? It's a clunky game, but a surprisingly well-balanced one, with a good (albeit VERY STRANGE) story and a very satisfying sense of progression.

There are a couple problems with translating the game, though, and the most notable one is figuring out how the text is encoded. If I LIST the program in MSX-BASIC, all the Japanese text displays just fine (with one notable exception that I'll go into below), but when I edit it on my Windows PC (which does have full Japanese extensions installed), I can't seem to find a text encoding method that properly displays everything.

I've tried viewing the .bas files via Shift-JIS, UTF-8, UTF-16, UTF-32, EUC-JP, EUC-JP-212, and Mac Japanese encoding, but the only one that produced any readable text whatsoever was Shift-JIS, which displays all katakana correctly (albeit in half-width characters), but converts all hiragana to garbage characters.

I assume someone on these forums must have some experience with this, though, so can anyone advise me on what the standard text encoding method is for plain text files created on an MSX? And is it something I'd be able to use through Windows? Editing massive .bas files using the MSX's built-in editor seems like it could drive even the most stone-faced man to madness, so any help anyone can provide on ways to edit this text a bit more easily would be greatly appreciated. Wink

Additionally, the one thing I'm not seeing in the game text -- even when LISTing it through MSX-BASIC proper -- is kanji. Izumic Ballade uses quite a lot of kanji (much more than I'm used to from an 8-bit game, in fact!), but every single kanji character is represented in the text by a lowercase English letter or piece of punctuation. This means that somewhere in the code, the font has likely been remapped to display kanji in place of certain lowercase English letters and punctuation characters... which, of course, would make translating the game pretty difficult, on multiple levels!

Obviously, I would eventually want to undo this for the English-translated version of the game, but is there a way you guys know of to... I guess, load the font used by the game into memory, with all its modifications, so I can LIST the code and see all the kanji right there in the edit window, for ease of translation?

Here's an example from main.bas -- I have no idea if the code will copy over to this forum correctly or not, but this is what I'm seeing when I view it in Shift-JIS:

10 DEFINTA-Y:RS=RND(-TIME):DIMF(47):DEFFNV=PEEK(&HDDEB):DEFFNK=PEEK(&HDB00+V\8)\2^(VMOD8)MOD2:GOSUB730:A=USR2(X):ONERRORGOTO1970
20 DATAブロード`a,3,ロング`a,15,,25,,50,苞恃饒a,80,蒿゙菶饒a,120,,200,,300,co・゙類,2,|,10,,20,デニムスーツ,35,,60,バトルスーツ,90,,150,,250
30 DATAリードマッピ,5,ロングマッピ,10,,20,,30,,50,,80,&鰕ウスピース,120,,250,,3,,10,戔冷・20,囀門ストラップ,30,,40,,70,,150,,300
40 DATA・沒・憺・・・憊埼$,ムーンシュガー,クリニングペーパー,ハイグリス,メンテナンスキット,リザーブエリクサー,%鴃゙・メタクリルキー,プラチナルキー," ヒデト髢類",梺髢類,ラストキー,pq鯱゚ンシル," 憊雌巵゙・
50 DATAライトシューズ,フックレーザー,ノーベルハンマ,レター,亦・恷・ハチソンシステム,テトラヒドロン,レシーバ,ダイヤモン,d鱆昃,恪・呀・満゙髴腰・,デリミッタ,ユータニー,ハートメシアス,3pqポシエット
60 DATAベークラサイ,0,ゼクケースト,2,アルキメテア,5,マグナボルト,8,ゼグスコル,15,バルブラスト,17,アフリイズ,20,イグザスケール,50,クレシエンド,30,ニクツ,20,アットイトス,15,ゲキルメツ,15,アッチエレランド,10,リタルダンド,10,サクタルフ,1,ラフサクダム,20
70 DATAトランシ,12,サグ,10,ア・テンポ,5,ポコ,0,シーミレ,35,センプレ,50,モルト,99,アドリブ,30,アル,2,アルーマ,5,アルーマセラ,15,アルマセラピータ,40,サーフケア,35,オキシフ,15,ソマリタ,5,キロクスル,1

Notice how all the katakana is fine, but then there's a bunch of messy random kanji in these DATA statements as well. Those sections are showing as plain, ordinary hiragana when I LIST the program on my actual FS-A1WX.

Also, notice the first couple DATA entries: ブロード`a and ロング`a. I happen to know off-hand that these are the first two weapons in the game, which are the "Broad Bass Clarinet" and "Long Bass Clarinet." The `a that appears after the katakana here is showing as `a when the program is LISTed in MSX-BASIC, too, and it's because in-game, those show as the abbreviation "BaCl," written using two custom characters -- the game's shorthand for "bass clarinet."

Kanji function the same way, it's just instead of two characters that contain "Ba" and "Cl," the letters and punctuation were substituted for various bitmapped kanji. I'm fairly certain, for example, that &鰕ウスピース is supposed to be 金のマウスピース (Golden Mouthpiece), with & being the character that gets substituted for the kanji 金, and the random 鰕 text there being the hiragana の getting corrupted by Shift-JIS encoding (which also seems to have swallowed up the マ in マウスピース).

(And yeah, all the equipment in the game is related to playing the bass clarinet. This is, as stated, a very odd game!)

Anyway, any help you guys can provide would be appreciated! Hopefully, someone has some clue how to assist me here, as I think this could be a pretty fun little project, and it would be really neat to be able to help make this bizarre novelty of an RPG ultimately playable in English.

-Tom

Login أوregister لوضع تعليقاتك

بواسطة JohnHassink

Ambassador (5655)

صورة JohnHassink

31-01-2018, 07:27

The font seems to be loaded into page 2 of the VRAM.

The code probably uses some cross-reference table where characters in the listing get 'converted' to the coordinates of a symbol from that page.

بواسطة JohnHassink

Ambassador (5655)

صورة JohnHassink

31-01-2018, 07:18

No, scratch that. Those sc5 bitmaps are apparently used to 'override' the patterns of the default ASCII symbols.
134 to 165 are Hiragana symbols.
166 to 223 are Katakana.
224 to 253 are Hiragana again.

بواسطة JohnHassink

Ambassador (5655)

صورة JohnHassink

31-01-2018, 08:01

I could still be missing the point, though.
After the game is initialized, that range of ASCII characters are indeed replaced by Hiragana and Kana.
Where the Kanji that's also seen there went, I have no idea. Also, the numbers and Roman symbols seem unaffected.
Displaying them in screen 5 after a break also doesn't show those horizontal grey lines, which they actually do during the game.
I think this is some NYYRIKKI level video display manipulation which I can't make chocolate of, especially with the brief look upon it I had just now.

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

31-01-2018, 08:00

Apparently, it's BsCl, not BaCl! My mistake there.

But thank you for looking into this -- it's very helpful! I assume the kanji are also overriding default ASCII, probably mostly within the punctuation range... maybe?

If I can get the encoding working right under Windows, it should be easy to substitute in the proper kanji for translation, knowing this. Honestly, I might even be able to write a modern-day Windows BASIC program to do the job for me, based on a lookup table I create -- how surreal would that be? Wink

-Tom

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

31-01-2018, 08:22

Actually, hmm. The more I think about it, the more this strikes me as a little odd -- when I LIST the program in MSX-BASIC, all the katakana and hiragana shows up just fine right there. And even when I view the code on my Windows PC, the katakana mostly shows up fine.

If the program is replacing ASCII characters with katakana and hiragana, shouldn't all the text I'm LISTing show as a gibberish string of random English letters instead, especially under Windows?

Do Japanese machines have a default set of katakana and hiragana ASCII characters in place of English lettering, perhaps? If so, maybe the game is just replacing the system characters with customized ones -- adapting a new font, basically, but leaving the positioning of the katakana and hiragana letters in place.

-Tom

بواسطة JohnHassink

Ambassador (5655)

صورة JohnHassink

31-01-2018, 08:36

Does it show the Japanese characters in the list only after the game is initialized, or does it as well when you boot your MSX without the disk inserted and only then have a look at the listing?
If the latter is the case, then you have proof that the game uses that particular system ROM which I told you about before, the one that contains Japanese symbols and which is only present in certain MSX models.

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

31-01-2018, 09:34

Yep. Sounds like that's the case, then, because it shows Japanese characters even if I boot the system with no disk inserted, then pop in the disk and type load"main.bas" followed by LIST.

Might explain why you were encountering disk errors on some European model MSXes, too, perhaps.

-Tom

بواسطة Manuel

Ascended (19273)

صورة Manuel

31-01-2018, 10:33

Are you listing in kanji mode or in plain basic?
In the latter case you just get the normal Japanese MSX font, no special ROM should affect that, as the character set is defined in the specification.

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

31-01-2018, 11:01

Manuel wrote:

Are you listing in kanji mode or in plain basic?
In the latter case you just get the normal Japanese MSX font, no special ROM should affect that, as the character set is defined in the specification.

I'm literally just booting my FS-A1WX, waiting for MSX-BASIC to load, popping in the disk with Izumic Ballade on it, then typing:

load"main.bas"
list

And I have no battery in my system, so there are no configuration settings saved that differ from the norm. It's just a basic MSX2+ default setup.

So I guess it's the normal Japanese MSX font indeed. The question is, what is the encoding method used?

...And I suppose the answer may be, simple ASCII! That might be why I'm having so much trouble here: no modern-day Japanese encoding methods are designed to account for an ASCII character table that includes katakana and hiragana, instead expecting 2-byte or 4-byte sequences for Japanese characters. The concept of storing a Japanese character in a single byte of data is virtually unheard of on any modern system!

I'll have to dig into this a little more, and see if I can find some sort of robust text editor I can use in Windows that accounts for displaying ASCII-table Japanese.

-Tom

بواسطة JohnHassink

Ambassador (5655)

صورة JohnHassink

31-01-2018, 11:50

So what is the "normal Japanese MSX font"?
I remember games like "Final Zone Wolf" and "Starship Rendezvous" displaying what's basically the ASCII row of "Wingdings" instead of Japanese characters, where it would show correctly on Japanese MSX machines.

صفحة 1/3
| 2 | 3