Japanese text encoding standard for MSX?

صفحة 2/3
1 | | 3

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

31-01-2018, 11:50

JohnHassink wrote:

So what is the "normal Japanese MSX font"?
I remember games like "Final Zone Wolf" and "Starship Rendezvous" displaying what's basically the ASCII row of "Wingdings" instead of Japanese characters, where it would show correctly on Japanese MSX machines.

I know one way to find out!

Haven't tested this yet to see if it works, but this should print a nice ASCII table on the screen when run, theoretically. Might need some tweaking -- I'll give it a go tomorrow when it's not 2:30 in the morning with work the next day. Wink

10 for i = 1 to 120
20 print i; chr$(i); " ";
30 if i / 6 = i \ 6 then print ""
40 next i
50 print "Press any key to continue"
60 a$ = inkey$
70 if a$ = "" then goto 60
110 for i = 121 to 240
120 print i; chr$(i); " ";
130 if i / 6 = i \ 6 then print ""
140 next i
150 print "Press any key to continue"
160 a$ = inkey$
170 if a$ = "" then goto 160
210 for i = 241 to 255
220 print i; chr$(i); " ";
230 if i / 6 = i \ 6 then print ""
240 next i
250 print ""
260 end

-Tom

بواسطة gdx

Enlighted (6071)

صورة gdx

31-01-2018, 11:59

Characters code of Japanese MSX is based on (JIS X0201 code). Above 80h are specific MSX code except the Katakanas.

Characters code of international MSX is based on ASCII code. Several codes before 20h and all characters above 80h are specific MSX.

Kanji-Rom of MSX seem to use JIS X 0208 code for Kanjis (level 1 and also level 2 for the newer) and JIS X 0201 for Ank Kanas.

بواسطة Randam

Paragon (1431)

صورة Randam

31-01-2018, 20:18

https://www.msx.org/wiki/MSX_font#Japanese

There you find the "standard" Japanese font an MSX uses in games if it uses the default of a Japanese msx. Those games show the MSX equivalent of "Wingdings" as JohnHassink calls it if you load those on a non-Japanese MSX. Many games have custom fonts though and characters are often in different orders than the "regular" order.

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

31-01-2018, 21:05

Awesome, thanks! It does seem like this game uses a custom font, but it kept the order of katakana and hiragana characters, at least -- it just replaced a lot of other characters with various kanji and game-use symbols (like the BsCl abbreviation).

-Tom

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

01-02-2018, 08:08

Success! I used that image John posted, generated an ASCII lookup table specific to this game, and created a rudimentary Qbasic program (mixing the BASICs, yo!) to substitute in all the kanji and hiragana that was being left out when I tried to interpret the files through regular ol' Shift-JIS, and this is the result:

10 DEFINTA-Y:RS=RND(-TIME):DIMF(47):DEFFNV=PEEK(&HDDEB):DEFFNK=PEEK(&HDB00+V\8)\2^(VMOD8)MOD2:GOSUB730:A=USR2(X):ONERRORGOTO1970
20 DATAフ゛ロート゛[BsCl],3,ロンク゛[BsCl],15,,25,,50,とこしえの[BsCl],80,とと゛ろきの[BsCl],120,,200,,300,全国ふた゛んき゛,2,[kanjiunknown5_pink]服,10,,20,テ゛ニムスーツ,35,,60,ハ゛トルスーツ,90,,150,,250
30 DATAリート゛マッヒ゜,5,ロンク゛マッヒ゜,10,,20,,30,,50,,80,金のマウスヒ゜ース,120,,250,,3,,10,しろきつな,20,こうかなストラッフ゜,30,,40,,70,,150,,300
40 DATAは゛んそうこ,むかしのひやく,てんし゛ゅの水,ムーンシュカ゛ー,クリニンク゛ヘ゜ーハ゜ー,ハイク゛リス,メンテナンスキット,リサ゛ーフ゛エリクサー,木のほ゛う,メタクリルキー,フ゜ラチナルキー," ヒテ゛トのかき゛",もくせいのかき゛,ラストキー,次元のヘ゜ンシル," し゛ょうさた゛め"
50 DATAライトシュース゛,フックレーサ゛ー,ノーヘ゛ルハンマ,レター,かいいんしょう,ハチソンシステム,テトラヒト゛ロン,レシーハ゛,タ゛イヤモン,県のちす゛,しゅんかんけ゛い,かせ゛のしゅくふく,テ゛リミッタ,ユータニー,ハートメシアス,3次元ホ゜シエット
60 DATAヘ゛ークラサイ,0,セ゛クケースト,2,アルキメテア,5,マク゛ナホ゛ルト,8,セ゛ク゛スコル,15,ハ゛ルフ゛ラスト,17,アフリイス゛,20,イク゛サ゛スケール,50,クレシエント゛,30,ニクツ,20,アットイトス,15,ケ゛キルメツ,15,アッチエレラント゛,10,リタルタ゛ント゛,10,サクタルフ,1,ラフサクタ゛ム,20
70 DATAトランシ,12,サク゛,10,ア・テンホ゜,5,ホ゜コ,0,シーミレ,35,センフ゜レ,50,モルト,99,アト゛リフ゛,30,アル,2,アルーマ,5,アルーマセラ,15,アルマセラヒ゜ータ,40,サーフケア,35,オキシフ,15,ソマリタ,5,キロクスル,1
80 DATAクスリ,32,824,33,2575,34,51500,36,927,着物,8,5150,9,32960,11,236900,13,67501050,アクセサリ,56,10000,26,15450,27,92700,22,255655350,ほん、AV,57,2575,57,2575,55,61800,38,41200,"楽器",16,2500,1,566500,4,1751000,17,70040,持物,41,1030,42,6180,37,15450,63,412000
90 DATAひっさつ,ほのお,れいき,あらし,は゛くはつ,いかす゛ち,いんせきのは゛くふう,ハサ゛ンフ゛レート゛,ねむり,と゛く,きゅうけつ,かきけし,ちりょう,ふっかつ,にけ゛る

As you can see, there were some kanji I couldn't quite make out in that image, but now that I can see them in context, I should be able to figure out what they are and revise my text conversion tool accordingly.

With this, I will absolutely be able to translate this text -- and pretty easily, at that.

So I guess this means the Izumic Ballade translation project is on, baby! Wooooo!! Wink

-Tom

بواسطة JohnHassink

Ambassador (5655)

صورة JohnHassink

01-02-2018, 09:42

Well that is pretty cool! Smile
Can you judge from this point how easy or hard it will be to replace the texts?
I guess it will require some (slight) reprogramming of the text routine, but if the game usually just fills text boxes, waits for player input, then commences the text crawl, it shouldn't be that complicated.
Unless the files already fill the RAM to the brink. You can check that with

?FRE(0)

That shows the amount of bytes left in the RAM, after what the currently loaded BASIC code is hogging up at the moment.
I'm saying this because, as we all know, accurate translations from Japanese to English usually take more character space to convey the same meaning, and I can already see those DATA lines in which the dialogue is stored increasing with 150% to 200%.

بواسطة wyrdwad

Paladin (934)

صورة wyrdwad

01-02-2018, 09:39

Hmm. That is a good point! I'll have to check and see.

Should be easy enough to replace the text, though -- I think rather than reprogramming the text routine, all that's going to need to be done is redrawing that image you linked on the previous page! Recreate a base font, with lowercase letters and punctuation rather than hiragana, katakana, and kanji, and I can just... type new lines in English! There's no reason, at that point, that they shouldn't show up just fine in-game, even just using the text routines that already exist.

I've got MP83 on board to help with that part, too, so I shouldn't have any trouble beyond simply dissecting the code and making sure everything actually remains stable.

...BTW, only tangentially related, but I believe I'm at the game's final boss now in my playthrough... and it is KICKING MY ASS. Man, what a tough final boss! This is going to require some serious planning, and also some serious spending -- I'm going to need to raid the item shop in town and buy every last healing item I can muster, because in this battle, it seems like items are my last, best hope!

-Tom

بواسطة sd_snatcher

Prophet (3642)

صورة sd_snatcher

11-03-2018, 04:05

Quote:

text there being the hiragana の getting corrupted by Shift-JIS encoding (which also seems to have swallowed up the マ in マウスピース).

Note: If you use CALL KANJI on the MSX, this very same effect with the Hiragana will happen too.

بواسطة panel123

Supporter (1)

صورة panel123

12-03-2018, 16:57

There are several standard methods to encode Japanese characters for use on a computer, including JIS, Shift-JIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult. Despite efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards are still in use today.

For example, most Japanese emails are in ISO-2022-JP ("JIS encoding") and web pages in Shift-JIS and yet mobile phones in Japan usually use some form of Extended Unix Code. If a program fails to determine the encoding scheme employed, it can cause mojibake (文字化け, "misconverted garbled/garbage characters", literally "transformed characters") and thus unreadable text on computers.

The first encoding to become widely used was JIS X 0201, which is a single-byte encoding that only covers standard 7-bit ASCII characters with half-width katakana extensions. This was widely used in systems that were neither powerful enough nor had the storage to handle kanji (including old embedded equipment such as cash registers). This means that only katakana, not kanji, was supported using this technique. Some embedded displays still have this limitation.

The development of kanji encodings was the beginning of the split. Shift JIS supports kanji and was developed to be completely backward compatible with JIS X 0201, and thus is in much embedded electronic equipment.

However, Shift JIS has the unfortunate property that it often breaks any parser (software that reads the coded text) that is not specifically designed to handle it. For example, a text search method can get false hits if it is not designed for Shift JIS. EUC, on the other hand, is handled much better by parsers that have been written for 7-bit ASCII (and thus EUC encodings are used on UNIX, where much of the file-handling code was historically only written for English encodings). But EUC is not backwards compatible with JIS X 0201, the first main Japanese encoding. Further complications arise because the original Internet e-mail standards only support 7-bit transfer protocols. Thus RFC 1468 ("ISO-2022-JP", often simply called JIS encoding) was developed for sending and receiving e-mails.

In character set standards such as JIS, not all required characters are included, so gaiji (外字 "external characters") are sometimes used to supplement the character set. Gaiji may come in the form of external font packs, where normal characters have been replaced with new characters, or the new characters have been added to unused character positions. However, gaiji are not practical in Internet environments since the font set must be transferred with text to use the gaiji. As a result, such characters are written with similar or simpler characters in place, or the text may need to be encoded using a larger character set (such as Unicode) that supports the required character.

Unicode was intended to solve all encoding problems over all languages. The UTF-8 encoding used to encode Unicode in web pages does not have the disadvantages that Shift-JIS has. Unicode is supported by international software, and it eliminates the need for gaiji. There are still controversies, however. For Japanese, the kanji characters have been unified with Chinese; that is, a character considered to be the same in both Japanese and Chinese is given a single number, even if the appearance is actually somewhat different, with the precise appearance left to the use of a locale-appropriate font. This process, called Han unification, has caused controversy. The previous encodings in Japan, Taiwan Area, Mainland China and Korea have only handled one language and Unicode should handle all. The handling of Kanji/Chinese have however been designed by a committee composed of representatives from all four countries/areas. Unicode is slowly growing because it is better supported by software from outside Japan, but still (as of 2011) most web pages in Japanese use Shift-JIS. The Japanese Wikipedia uses Unicode.

بواسطة rderooy

Paladin (686)

صورة rderooy

12-03-2018, 17:19

Wow. And there I thought all the windows codepages in the 90's was a mess...

صفحة 2/3
1 | | 3