Tuesday, 16 September 2014

Do you play English? Part 1

One of my main attributions in the ScummVM team for the past few years has been to work on translations. There are two aspects to it:
  • Translating the ScummVM software itself.
  • Translating games.
Concerning the first point, I wrote some of the code to handle translations in an efficient and portable way in ScummVM. I also maintain the French translation and coordinate the work of the translators for the other languages. I may write a post on that topic later. But first I will write a series of three posts in which I will focus on the second point: translating games. I will present several examples to show the variety of work this can involved.

In some cases we can improve slightly a translation for a game without having to modify the data files. I will write about that in this first part. But to turn The Beast into Prince Charming, a face lift is not sufficient and we need to do a more invasive surgical operation. Such an operation is limited to the cases where we have access to the data files. We have good relations with some game companies and we have been allowed to provide some formerly commercial games as freeware on our web site. This made this work possible and I will present this in parts 2 (improve the original translation) and 3 (add a new translation) in the next few days.

Note: This blog post contains embedded source code examples that are not visible on the RSS feed.

Part 1: Fixing a few missing or wrong strings in a game


In some game there is a minor issue with the official translations. Sometimes a subtitle is missing and sometimes there is a big spelling or grammatical mistake. Considering my involvement with the Broken Sword game engine (see my previous post), what better example to start with than Broken Sword?

In 2008, it was reported that an error was displayed instead of the correct subtitle in one place, when George says "Oh?". I will grant you this was not a very critical subtitle.

Here is the code that gets the subtitle to display from its Id.

As you can see it is quite simple.
On the first two lines, knowing the text ID and the language, it asks the Resource Manager to give some data, which in that case come from the text.clu (or text.clm for the mac version) file.  This file contains many blocks. A block contains:
  • A 20 bytes header
    • Bytes 0-5: resource type (here "ChrTxt")
    • Bytes 6-7: version
    • Bytes 8-19: Related to compression (compressed size, compression type and uncompressed size).
  • The number of strings in this block coded on 4 bytes
  • For each string the offset at which it starts (relative to the end of the block header) again coded on 4 bytes.
  • And finally the strings.
Here the version is always 1 and the compression is always "NONE". So we can ignore the header altogether. Thus the code skips it without even looking at its content.

The next few lines check that the string index in this block is smaller than the number of strings. The string ID is coded on 4 bytes, The two highest bytes identify the resource block (ITM_PER_SEC is 0X10000) and the two lowest bytes identify the string in the block (ITM_ID is 0xFFFF).

Then from that index it reads the offset at which the string starts. If the offset is zero it returns an error string. Otherwise it returns the string from the data.

To help you visualize what I wrote above, here is a picture (hexadecimal and ASCII) of the start of one small block. This is from my mac version, so numbers are big endians (see my previous post).

Blue: Header
Red: number of strings (here 11 since it is in hexadecimal)
Green: the offsets for the start of the string (we have 11 of them, each one coded on 4 bytes)
You may have guessed it already, for the particular string from the bug report, in some languages the offset is zero, so instead we get the error message. The fix is simple: I identified the text ID (2950145) and hardcoded in the source code the string to use.


A bit later it was discovered that a bunch of subtitles are also missing from the demo, presumably because it was released early before translations were finalized. But in that case the issue was slightly different: instead of having an offset of zero, the offset itself was also missing. The text ID pointed to an index bigger than the number of strings in the corresponding block. So here is the current version of the code will all the workarounds:


I did a similar fix in Dreamweb. The command "Aller vers" ("Go to") was misspelled "Aller ver". And since it is one of those string present virtually everywhere in the game I decided to add a workaround to fix it, again by hardcoding the correct string in the source code.

See you tomorrow for part 2 and more important changes to a game original translations.

No comments:

Post a Comment