Type in BASIC source list by OCR tool without typing

Door st1mpy

Hero (564)

afbeelding van st1mpy

23-10-2020, 12:57

Any way to do that? Scan the source code list pages from magazines and convert to a MSX .bas file.

Aangemeld of registreer om reacties te plaatsen

Van theNestruo

Master (228)

afbeelding van theNestruo

23-10-2020, 13:11

Probably not the answer you are looking for, but my first step would be looking for the game in this page: http://msxbasic.blogspot.com/ Maybe you are lucky and Ryback already typed it!

For the OCR way, I guess the OCR gives you a plain text... That text can be saved as ASCII (.ASC), loaded in an emulator, and then saved back as tokenized BASIC (.BAS). Don't know if there is a shorter path.

Van FiXato

Scribe (1642)

afbeelding van FiXato

23-10-2020, 13:33

theNestruo wrote:

Probably not the answer you are looking for, but my first step would be looking for the game in this page: http://msxbasic.blogspot.com/ Maybe you are lucky and Ryback already typed it!

For the OCR way, I guess the OCR gives you a plain text... That text can be saved as ASCII (.ASC), loaded in an emulator, and then saved back as tokenized BASIC (.BAS). Don't know if there is a shorter path.

With openMSX you could paste it directly into basic I guess.

You'd still rely on the quality of the OCR, and how it was formatted.
Have a look for example at some of the earlier examples in the first MCM Listingboek.
Narrow columns, fonts that might not easily be recognised, listings that by the looks of it were originally printed and then scanned, a checksum column that would require column selection, word-wrapping.

The PDF actually already supports text selection, but as you can see here, it looks like that already had issues detecting line and word boundaries:

Van Briqunullus

Master (203)

afbeelding van Briqunullus

23-10-2020, 14:17

I've done a few tests a while ago. First you'll need high resolution scans, I think I did 300 dpi. Then you may need to convert images to black and white and enhance contrast, depending what colors the magazine used. Those images can be processed by OCR, but they'll still contain errors. So the final step would be to verify the checksum for each line.

This is quite a task, but it still is a lot quicker than typing everything.