Monday, 2 October 2017

Supernova - translation to English

As you may be aware I was a mentor for the ScummVM organisation for the Google Summer of Code this year. I was supervising the work by Joefish on the Mission Supernova game. The engine has not yet been merged into the main repository, but work is continuing in Joefish's GitHub fork.

At the end of the GSoC coding period the first chapter of the game was completable. However the game is only available in German. One of our aim is not only to make it possible to play this game again on all platforms supported by ScummVM, but also to play it in English. And this is what I have been focussing on in the past week.

I have actually been working on the translation for a while with the help of Joefish. We are using the ScummVM translations website to handle the translation. With previous games when working on a translation we would use more rudimentary tools, such as a shared Google Doc spreadsheet. Thus using our translations website to translate a game is an experiment, but so far I have found it a positive one. And I expect it will also allow participation from the community to complete or improve the translation (this has not been the case yet as we did not advertise this translation until now).

What I was focussing on this week is writing a tool to generate an engine data file, as we do with other engines. Until a few days ago all the text was hardcoded in the engine. But the idea is to have the text in this engine data file instead and to have the engine load all the strings when it starts. This has two main advantages:

  • The strings being into a separate file instead of being in the engine, the ScummVM executable ends up being smaller and using less memory when the engine is not running (for example when playing another game).
  • We can store the strings in several languages in this engine data file and the engine will load the strings for the language selected by the user when the game was added to ScummVM. This means we can provide translations for the game.

Format of the engine data file

The data file starts with a 3 byte identifier ('MSN') that can be used to identify the file, and a 1 byte version number so that we can change the format in the future without breaking everything.

The rest of the file is a collection of blocks, with each block having a 12 byte header and a variable size data section:
 byte 1-4   marker, for example 'TEXT' for the game text
 byte 5-8   language code, for example 'de' for German
 byte 9-12  block data size in byte
 byte 13-N  block data

This file structure means that we can easily add other types of data in the future if needed, such as the font data (that is currently hardcoded in the engine) or the mouse cursors (also currently hardcoded in the engine). The marker tells us what type of data we will find in a block. Currently the markers used are 'TEXT' (game strings), 'IMG1' and 'IMG2' (for newspaper bitmaps).

With this file format we can also easily add translations to more languages.  Currently we have data for German and English.

Generating the engine data file

The German text is hardcoded in the tool that generates the engine data file. All other data however are provided with supporting files. The tool has a list of languages for which it will look for those supporting files. Currently for each language we can have 3 files:

  • strings-xx.po
  • img1-xx.pbm
  • img2-xx.pbm

(where 'xx' is a language code)

For each file found a corresponding data block will be added in the engine data file.

The strings-xx.po file is typically generated and updated by our translations web site and contains the translation in a given language for each German string. The German strings are defined in a specific order, and when generating the TEXT block for another language the tool iterates on the German strings, for each string looks for a translation in the po file for this language, and if found write this translation and otherwise write the original German string. This means that untranslated text will appear in German when playing.

The ppm files are bitmap images in portable bitmap format for the two newspaper article images in the game. They can for example be generated with gimp. Other than the header, the ppm format happens to store the data in exactly the same format as the game does for the original bitmaps in German. So the tool simply reads the header to do some sanity checks on  the format and image size and write the rest of the file to the engine data file.

So adding a new language simply means adding its code to the language array in the tool source code, and dropping some pbm and po files in the directory where we execute the tool.

I didn't paste any source code in this post, but you can see the source code for the tool here: https://github.com/Joefish/scummvm/tree/supernova/devtools/create_supernova

Status

The game contains 655 strings. Currently 281 of those have been moved to the engine data file, and this includes the name and description for all the objects. A lot of strings are still hardcoded in the game engine though, such as all the dialogs, and they will be moved to the engine data file as well eventually.

The progress on the translation to English is similar, as 279 strings are currently translated, although some of those are for strings that have not yet been moved to the engine data file, and can thus not yet been seen translated in the game.