Thursday 18 September 2014

Do you play English? Part 3

In this post I will continue to write about translating games for the ScummVM project. This is the last  part of a three parts series.

Part 3: Translate a game into a new language


Some of the games for which we released a freeware version are from eastern Europe and were not released in English. So to give them a wider audience we decided to add an English translation.

The first such game was Dragon History, a Czech game for which a GSoC student added support in ScummVM in 2009, with the help of the original developer. The game was only released in Czech and Polish originally, but German and English translations have been added. If you want to know more about this game, see the official web site: http://www.ucw.cz/draci-historie/index-en.html

Since I don't know much about Dragon History myself, in this post I will focus on two Polish games from LK Avalon. The first one Soltys, is supported since ScummVM 1.5. It is available to download for free on our web site, and in addition to the original Polish version, we have an English and Spanish translation.

The second game I will write about is Sfinx. It is very similar to Soltys in the way it works, and support for it in ScummVM was added during this year GSoC. We are currently working on the English translation and very soon (maybe tomorrow?) we intend to make it available so that non-Polish ScummVM users can test the game, report bugs and also suggest improvement to the translation.

Edit: the call for tests is now live!

Both Soltys and Sfinx have two data files named vol.dat and vol.cat. The latter is a catalog that lists the files present in the former and at which offset they start. So when the game needs a file, it can look into the catalog where to start reading it in the vol.dat file. To edit the data files however, we need to extract those. Then we can repackage them into a new vol.dat file, generating a new catalog file as well in the process. We have two tools to perform the extraction and packaging, and they work for both Soltys and Sfinx (despite some minor differences in the file format).

Once uncompressed, you will have a lot of files. All the dialogs are in a file named CGE.SAY. The hotspots names are in the files with the SPR extension. The other files can be ignored (they will be needed when repackaging the game though.

So what does the CGE.SAY look like? Here is a small portion of it that shows almost everything there is to know:

;--Anna above.
 1:22=Oh, what a nice pussy!|I would love to have one
;--Vincent in the dark
 1:31=Where's the light? I can't see
 1:32=There should be a shutter,|let's try to lift it

;======================================================================

;--Vincent about the cleaning stuff
 2:01=Cleaning? Never!|It's for the girls!
;--Anna about the cleaning stuff
 2:02=Isn't there a gentleman around?

Lines starting with a semi column are comments. There are a lot of them, which is a great help.
Dialog lines start with xx:yy as you can see above. The xx is the room number. So in the example above we have a portion of the dialogs for the first two rooms. The yy is the text number in this room.
The pipe indicate a line break. So for example the first text of the second room will look like this in game:



Simple, isn't it?
Now let's have a look at one of the SPR files, for example 02ZSYP.SPR. As the name suggest this is one of the hotspots in the second room. The start of the file look like this in the polish version:

Type=AUTO
Name=zsyp na <98>mieci

[phase]
02zsyp00
02zsyp01
02zsyp02

[seq]
 0   -2   0   0  0   8
 1    3  84   2 127  8  .OTWIERA
 1    0  85   2 127  8  .ZAMYKA

 2   -2   0   0  0   8

[ftake]
say    -2    2:5  brudny

[mtake]
reach  -2  2:7     . zsyp
SOUND  2:7 2:84
pause   -1 72
SAY    -2  2:4
NEXT   -1   0      . smiec popycha

The name is what appears on screen when moving the cursor to the hotspot. We can now also see that the file is named after the hotspot name. This makes it easy to find a file when you know the hotspot name... in Polish (not so easy when you know it in English ;) ).

The <98> is the way my text editor displays non ASCII characters using their hexadecimal value (so in decimal we have here character 152).  In this case the character is ś. The game is using the CP852 encoding (with only the example above it could also have been using the mazovia encoding, but other characters allow to make the distinction). Fortunately English does not use many non ASCII characters, so we don't have to deal with this much.

So, the polish name is zsyp na śmieci. Google translate tells me (I don't speak Polish myself) that it translates into garbage chute. So let's modify the second line in the file and see how it looks:

Type=AUTO
Name=garbage chute

[phase]
02zsyp00
02zsyp01
02zsyp02




For Sfinx, the bulk of the work was done by Strangerke and then I made a couple of passes to improve the English and fix spelling mistakes. Uruk, the GSoC student who worked on the engine, also made some modifications.

For Soltys, the Polish to English translation was done by neutron and the Spanish version is from IlDucci and The FireRed. I am currently working on a French translation as well.

The process I explained above is therefore very similar to what I explained in the previous post to improve an existing translation for Drascula:

  • Unpack the data file.
  • Edit the dialogs and hotspot names.
  • Repack.

However there is one major difference. Because the game was only released in Polish in the first place, the font data does not contain all the characters we need for other languages. For English this is not an issue, unless you happen to use a word loaned from French, such as déjà vu or café.  When translating to French however you need those accentuated characters. So there is one more step to do: modify the font data (which was done by Strangerke on Soltys).

The font is stored in a file called CGE.CFT. This is a simple bitmap font, for which each pixel is black (or another color) or transparent. So we need one bit to store a pixel. If the bit is 1, the pixel is visible, and if the bit is 0, the pixel is not visible. The height of the font is 8 pixels, which conveniently can therefore be stored on one byte (because in case you don't already know, 1 byte contains 8 bits). The width is variable, and if for example a character is 4 pixels wide, thus 4x8 pixels, its data is coded on 4 bytes. And there are 256 possible characters.

The font file starts with the width, coded on one byte, for each characters. That takes the first 256 bytes. Then the bitmap starts. Here is the start of the file for Sfinx displayed with hexadecimal values. The first column is the address (also in hexadecimal). We have 16 bytes on each line. A star denotes one or more lines that are identical to the previous line.

0000000 04 06 06 06 06 06 06 04 04 04 04 04 04 04 04 04
0000010 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
0000020 04 02 04 06 04 05 05 02 04 04 03 04 02 03 02 03
0000030 04 04 04 04 04 04 04 04 04 04 02 02 04 04 04 05
0000040 05 05 05 05 05 05 05 05 05 02 04 05 04 06 05 05
0000050 05 06 05 05 06 05 04 06 04 06 05 03 03 03 04 05
0000060 04 05 04 04 04 05 03 04 04 02 03 04 03 06 04 04
0000070 04 04 04 05 03 04 04 06 04 04 04 04 02 04 06 06
0000080 04 04 04 04 04 04 04 04 04 04 04 04 04 05 04 05
0000090 04 04 04 04 04 04 04 05 05 04 04 04 04 04 04 04
00000a0 04 04 04 04 05 05 04 04 05 05 04 04 04 04 04 04
00000b0 04 04 04 04 04 04 04 04 04 04 04 04 04 05 04 04
00000c0 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
*
00000e0 05 04 04 05 04 04 04 04 04 04 04 04 04 04 04 04
00000f0 04 04 04 04 04 04 04 04 04 04 04 04 04 04 03 04
0000100 00 00 00 00 1e 29 2f 29 1e 00 1e 2b 2f 2b 1e 00
0000110 0e 1f 3e 1f 0e 00 0c 1e 3f 1e 0c 00 1c 5b 7f 5b
0000120 1c 00 1c 5e 7f 5e 1c 00 ff ff ff 00 ff ff ff 00
0000130 ff ff ff 00 ff ff ff 00 ff ff ff 00 ff ff ff 00
*
0000180 ff ff ff 00 ff ff ff 00 ff ff ff 00 00 00 00 00
0000190 2f 00 03 00 03 00 14 7f 14 7f 14 00 26 7f 32 00
00001a0 13 0b 34 32 00 1a 25 1a 28 00 03 00 3c 42 81 00
00001b0 81 42 3c 00 06 06 00 08 1c 08 00 60 00 08 08 00
00001c0 20 00 38 07 00 3f 21 3f 00 22 3f 20 00 3b 29 2f
00001d0 00 31 25 3f 00 0f 08 3f 00 37 25 3d 00 3f 25 3d
00001e0 00 01 3d 03 00 3f 25 3f 00 37 25 3f 00 24 00 64
00001f0 00 08 14 22 00 14 14 14 00 22 14 08 00 02 29 05
0000200 02 00 1e 21 2d 0e 00 3c 0a 09 3f 00 3f 25 26 18
0000210 00 1f 21 21 12 00 3f 21 22 3c 00 3f 25 25 20 00
0000220 3f 05 05 01 00 1e 21 29 19 00 3f 04 04 3f 00 3f

If we look at the first few lines, we can see that the characters are between 2 and 6 pixels wide.
Let's try to have a look at the start of the alphabet. In the ASCII table, we can see the value of the letter A is 65, and since values start at 0, that means this is the 66th character. So first we will compute the sum of the widths of the first 65 letters.
That would be 4 + 6 + 6 + 6 + ... + 4 + 4 + 4 + 5 = 263
So if we skip the first 256 bytes (the character widths) and then the next 263 bytes, we should get the data for letter A. So let's look at the data that starts at address 256 + 263 = 519 (207 in hexadecimal).
I have highlighted in red above the width for the 66th characters, which as we can see is 5, and the 5 bytes starting at address 0x207.
Let's write them, with the corresponding binary representation below (with the least significant bit at the top):
 3c 0a 09 3f 00
 0  0  1  1  0
 0  1  0  1  0
 1  0  0  1  0
 1  1  1  1  0
 1  0  0  1  0
 1  0  0  1  0
 0  0  0  0  0
 0  0  0  0  0

So now a bit of ASCII art: we replace the 1 by a @ and the 0 by a space

     @ @
   @   @
 @     @
 @ @ @ @
 @     @
 @     @

You recognize something?

Just for fun, let's do the same for the next two letters:

 @ @       @ @ @  
 @   @     @     @
 @ @ @     @      
 @     @   @      
 @     @   @     @
 @ @ @       @ @

So we can edit the font file using an hexadecimal editor for example. This involves some ASCII art (exciting :-), and it can be challenging to fit an accentuated characters on 5x8 pixels), some additions on hexadecimal numbers and some conversions between binary and hexadecimal (boring :-( ).

This concludes my three parts posts on translating games for ScummVM. I hope you found it interesting. Now I will take some rest while you start testing Sfinx. There is one last thing though: ScummVM is a community effort, and it does not only involves software developments. You can contribute in other ways, such as translating freeware games, translating ScummVM itself or helping with the user manual. So if you are motivated to help us, please get in touch for example on our IRC channel (#scummvm on irc.freenode.net) or forum.


No comments:

Post a Comment