MSX character set -> Unicode HELP NEEDED

Page 1/8
| 2 | 3 | 4 | 5 | 6

By Manuel

Ascended (19804)

Manuel's picture

02-10-2019, 10:48

HELP needed from Brazillian, Japanese, Korean, Russian and Arabic MSX users!

As you may be aware: the “Terminals Working Group” proposed new characters to be included in Unicode (also picked up here), including some uniquely found in the MSX character set. The person who worked out the MSX mappings is Rebecca Bettencourt of KreativeKorp. The proposal includes a mapping of the full characterset to unicode, for MSX (international) and a lot of other legacy systems.

After reading the news I took a look at it and spotted some minor errors and omissions in the MSX mapping and I also made a dump of the character set of several other MSX systems, to be able to make the mapping for all used character sets on MSX.

Rebecca's group makes a "Video" mapping (which byte in VRAM is rendered as which character) as well as an "Interchange" mapping (which code is sent to the BIOS to produce which character).

So, sure, I can review the international character set, but for Brazillian, Korean, Japanese, Russian and Arabic sets I'm not much of a help. But there are many of you who can help here!

I made a dump of the character sets of several machines here (showing directly the video mapping): https://msx.pics/album/lW6e
Rebecca used these to create draft unicode mappings (Video and Interchange) and made them available for review here: https://drive.google.com/drive/folders/1hSleqcdRihNTM3lAYDzR...

One specific question for you Brazillian users: the later Gradiente machines (e.g. Expert DDPlus) and Sharp Hotbit 1.2 machine have a Cz glyph on character 0x9E. International contains the Pt (Peseta) symbol there and older Brazillian machines the Cruzeiro Cr (₢) symbol. I guess the Cz is the Cruzado, which temporarily replaced the Cruzeiro, according to Wikipedia.
Should the Cz symbol be mapped separately or can we just use the Cruzeiro symbol there? Note that at the moment, there is no Unicode point for the Cruzado.

Another question is: are the interchange mappings OK? The interchange mappings are the same as the video mappings, except they map the control sequences for characters 0x00-0x1F and don't include 0x7F (where defined). We assumed all MSX computers handled these characters the same, but if you know differently please let us know. :)

What will happen with your input?

  1. Rebecca will use it to update the proposals to the official unicode standard. These mappings will be attached so the coverage can be checked and if necessary new symbols will be proposed.
  2. it will be used to improve the input (type with keyboard, type from file, paste from clipboard) and output (copy to clipboard) routines of openMSX. I'm working on that myself.

Please post your ideas, concerns, corrections and comments in this forum thread.

Thanks a lot in advance!

For your convenience, everything grouped together:

International

This is what we started with: the international mapping. Reviewed by me and cross checked with what we had in the openMSX unicode mapping. Rebecca and I believe (after some discussion) this mapping is now OK.

Character set (used on International, USA, GB, German, French and Spanish (including Argentinian) machines):

For German (DIN) it looks like this (taken from Sony HB-F700D (German) or Panasonic CF-2700 (German version)):

Video mapping
Interchange mapping

Japanese

Although there are different styles of characters ("font") used, the mapping seems to be constant, although I only sampled 3 different machines. If you know about different character sets for Japanese machines, please let me know.

This is the character set as you can find it in a Sony HB-F900/National CF-2000 and Panasonic FS-A1GT respectively:

Please review:
Video mapping
Interchange mapping

Russian
Character set (taken from Yamaha YIS-805/128R2/Yamaha YIS-503IIIR):

Please review:
Video mapping
Interchange mapping

Korean
Character set (taken from Daewoo CPC-400S and DPC-180):

Please review:
Video mapping
Interchange mapping

Brazillian

There are several variants, so they all have their own mapping. As explained above, we for now assumed Cr == Cz (at 0xE9) and that leads to these variants:

Gradiente Expert XP-800:

Please review:
Video mapping
Interchange mapping

Sharp Hotbit 1.1:

Please review:
Video mapping
Interchange mapping

Sharp Hotbit 1.2 and Gradiente Expert DDplus (respectively):

Please review:
Video mapping
Interchange mapping

Arabic

Character set of Bawareth Perfect MSX1 and Yamaha AX500 (respectively):

Please review:
Video mapping
Interchange mapping

Character set of Al Alamiah AX-170:

Please review:
Video mapping
Interchange mapping

Bonus: SVI-328
As a bonus, if you're familiar with the SVI-328, you can help reviewing this one as well.

Character set (probably international):

Please review:
Video mapping

If someone can explain how the interchange mapping works for these machines, we're interested to know.

Login or register to post comments

By NYYRIKKI

Enlighted (6126)

NYYRIKKI's picture

02-10-2019, 16:06

Manuel wrote:

If someone can explain how the interchange mapping works for these machines, we're interested to know.

If you mean, how to input inverted characters... you don't. You just input characters. What you see in video mapping are just alternative fonts that on input get exactly same ASCII values as their normal counterparts. On output you can use VT-52 control codes to change the font, so one might think the code for inverted "A" is 0x1B70411B71 ...but it is not. It is just 0x41... just like with normal "A".

By Manuel

Ascended (19804)

Manuel's picture

02-10-2019, 18:43

I mean, how can you print these characters in BASIC. All of them. (WIthout using VPOKE as I did to generate the table.)

By NYYRIKKI

Enlighted (6126)

NYYRIKKI's picture

02-10-2019, 19:10

Try something like:

10 for r=112 to 113:for i=32 to 126:?chr$(i);:next:?chr$(27);chr$(r);:next
20 for i=160 to 223:?chr$(i);:next

(I don't know about VPOKE character 127)

By NYYRIKKI

Enlighted (6126)

NYYRIKKI's picture

03-10-2019, 07:51

NYYRIKKI wrote:

(I don't know about VPOKE character 127)

Sorry, my bad... I meant 95 (=Empty) and 191 (=Cursor)

By wbahnassi

Master (215)

wbahnassi's picture

03-10-2019, 10:15

Visual inspection of the Arabic tables is a bit challenging as it sometimes conceals the true form of the character (initial, middle, final) due to the character being drawn right next to two other characters from adjacent columns. If only the columns were separated by spaces, it reduce the doubt... What did you use to draw those tables? I can confirm on my AX 170 here.

By gdx

Enlighted (6622)

gdx's picture

03-10-2019, 11:22

The display of characters is relatif on Arabic and Korean MSXs.

By GreyWolf

Champion (433)

GreyWolf's picture

03-10-2019, 12:24

That's about MSX Russification

By Manuel

Ascended (19804)

Manuel's picture

03-10-2019, 14:19

I used this program:

10 SCREEN1:KEYOFF:COLOR1,15,15:DEFINTA-Z
20 RS=32:WIDTHRS:OX=(RS-&HF)\2:OY=(24-&HF)\2
30 FOR R=0 TO &HF
40 LOCATE OX+R,OY-1:PRINTHEX$(R)
50 FOR C=0 TO &HF
60 X=OX+C
70 Y=OY+R
80 IF R=0 THEN LOCATE OX-1,OY+C:PRINT HEX$(C)
90 VPOKE BASE(5)+X+(RS*Y),C*&H10+R
100 NEXT C,R
110 GOTO 110

Can you modify it to include the required spacing? If not I can attempt it later.

By wbahnassi

Master (215)

wbahnassi's picture

03-10-2019, 15:44

Thanks Manuel, yeah I can modify it, no worries. I'll do it when I get back from work tonight Smile But I think the mappings will need to be updated for Arabic from what I see so far.

By Parn

Paladin (864)

Parn's picture

03-10-2019, 16:43

Manuel wrote:

One specific question for you Brazillian users: the later Gradiente machines (e.g. Expert DDPlus) and Sharp Hotbit 1.2 machine have a Cz glyph on character 0x9E. International contains the Pt (Peseta) symbol there and older Brazillian machines the Cruzeiro Cr (₢) symbol. I guess the Cz is the Cruzado, which temporarily replaced the Cruzeiro, according to Wikipedia.
Should the Cz symbol be mapped separately or can we just use the Cruzeiro symbol there? Note that at the moment, there is no Unicode point for the Cruzado.

Hi, I'm Brazilian and I'll try to help. I don't think it's useful at all to waste space on the Cruzado. It was such a short-lived currency and it's usually written as two letters anyway. ₢ is a bit more useful because it was common in typewriters from the Cruzeiro era, even though I'm hard pressed to think of a situation related to MSX where it would be useful today (the Brazilian currency is the Real since 1994).

Page 1/8
| 2 | 3 | 4 | 5 | 6