BMG (File Format)

From Custom Mario Kart
Revision as of 06:58, 10 April 2011 by Chadderz (talk | contribs)
Jump to navigation Jump to search

A BMG file is a message file. Each message is referenced by an message id (MID).



Raw BMG File

The the message file consists of a header and three data section:

  • The offset table into the string pool
  • The string pool with 16 bit wide characters.
  • The list of related message ids (MID).

Header

In BMG files all numbers are stored in big endian format. Each BMG file starts with a BMG header:

Offset Type Description
0x00 char[8] The magic: "MESGbmg1"
0x08 u32 Total size of the file
0x0c u32 number of sections, usually 3
0x10 u32 unknown value, usually 0x02000000
0x0c char[12] more unknown data (filler for alignment)

The header is followed by the sections. Each section starts with an section type independent part:

Offset Type Description
0x00 char[4] The magic of the section
0x04 u32 The total size of the section
GNU C example
#define BMG_MAGIC      "MESGbmg1"
#define BMG_MAGIC_NUM  0x4d455347626d6731ull

typedef struct bmg_section_t
{
    char           magic[4];       // a magic to identify the section
    be32_t         size;           // total size of the section
    u8             data[0];        // section data
}
__attribute__ ((packed)) bmg_section_t;

typedef struct bmg_header_t
{
    char           magic[8];       // = BMG_MAGIC
    be32_t         size;           // total size of file
    be32_t         n_section;      // number of sub sections
    be32_t         unknown1;       // = 0x02000000
    char           unknown2[0xc];  // unknown data
    bmg_section_t  section[0];     // first sub header
}
__attribute__ ((packed)) bmg_header_t;

Text Index Table

The text index table is usually the first section of a BMG file.

Offset Type Description
0x00 char[4] The magic = "INF1"
0x04 u32 The total size of the section
0x08 u32 Number of messages (N)
0x0a u32 The size of each item, in Mario Kart always 8
0x0c u32 Unknown member, maybe an alignment filler, usually 0
0x10 N x itemsize N items of the specified size

In Mario Kart Wii each item has 2 data members:

Offset Type Description
0x00 u32 Offset into text pool of the "DAT1" section
0x04 u32 Unknown flags for that string.
GNU C example
#define BMG_INF_MAGIC      "INF1"
#define BMG_INF_STD_FLAGS  0x01000000

typedef struct bmg_inf_list_t
{
    u32            text_index;     // offset into text pool
    u32            flags;          // unknown flags
}
__attribute__ ((packed)) bmg_inf_list_t;

typedef struct bmg_inf_t
{
    char           magic[4];       // a magic to identify the section
    be32_t         size;           // total size of the section
    be16_t         n_msg;          // number of messages
    be16_t         inf_size;       // size of inf items
    be32_t         unknown;        // = 0
    bmg_inf_list_t list[0];        // data
}
__attribute__ ((packed)) bmg_inf_t;

String Pool

The second section contains the string pool. String are sequences of 16 bit values (like UTF-16 or Windows wide char) terminated with a NULL value. But NULL values are also possible in the middle of the string, if they are part of a 0x1a escape sequence.

Offset Type Description
0x00 char[4] The magic = "DAT1"
0x04 u32 The total size of the section
0x08 char[*] The string pool
GNU C example
#define BMG_DAT_MAGIC  "DAT1"

typedef struct bmg_dat_t
{
    char           magic[4];       // a magic to identify the section
    be32_t         size;           // total size of the section
    u8             text_pool[];
}
__attribute__ ((packed)) bmg_dat_t;

Message IDs

The third section contains the table with the message ids (MID). The number of elements of this table is equal to element number of the text index table. Elements with the same table index are attributes for the same string.

Offset Type Description
0x00 char[4] The magic = "MID1"
0x04 u32 The total size of the section
0x08 u32 Number of messages (N)
0x0a u32 Unbown value, usually 0x1000
0x0c u32 Unknown member, maybe an alignment filler, usually 0
0x10 Nx u32 N message IDs (MID)
GNU C example
#define BMG_MID_MAGIC "MID1"

typedef struct bmg_mid_t
{
    char           magic[4];       // a magic to identify the section
    be32_t         size;           // total size of the section
    be16_t         n_msg;          // number of messages
    be16_t         unknown1;       // = 0x1000
    be32_t         unknown2;       // = 0
    u32            mid[0];         // message id table
}
__attribute__ ((packed)) bmg_mid_t;

0x1a Escape Sequences

Nintendo uses sequences starting with the 16 bit value 0x1a to embed binary data in the strings. This binary data may contain NULL values that normally used as end of string marker.

Offset Hex Value Description
0x00 001a Start of the sequence. 0x1a is also known as CTRL-Z of ASCII SUB (substitute)
0x02 xxyy xx is the total size of the escape sequence.
yy something like a usage ID.
0x04 nnnn The binary value. Th size depends on the xx above:
  • xx=06: nnnn is a 16 bit value
  • xx=08: nnnn is a 32 bit value
  • xx=0a: nnnn is a 48 bit value

Here are some well known escape sequences:

Hex Values Description
001a 0801 xxxx xxxx Inserts the unicode character xxxx xxxx.
001a 0602 0000 Insert the name of the current player.

Wiimms BMG Text File

Wiimm has specified a text representation of the BMG files for his SZS Tools. The idea is to makes editing and creating string tables much easier for humans and script. Wiimms SZS Tools accept as input raw and text BMG files and can convert each format to the other. Moreover BMG files can be used to patch other BMG files.

Syntax

Each time when the Wiimms SZS Tools writes a BMG text files the syntax rules are written comment to the file:

The magic
  • The first 8 charatcers are the magic and must be "#BMG-TXT" (without quotes).
General syntax
  • Spaces and tabs and beginning and end of line are ignored.
  • Empty lines and lines beginning with a '#' are ignored.
  • The coding is UTF-8.
Syntax for message lines

A message line has 1 of three formas:

MID /
MID ~ NUMBER
MID = TEXT
  • The first line ('/') defines a message without text. This is different from a message with an empty text.
  • The second line ('~') defines a message without text. Additionally the flags are specified (hex NUMBER, default=0x01000000). If text is needed too, then define a '=' line with the same MID.
  • The third line ('=') defines a message with text.
  • Spaces and tabs before the '/', '~' and '=' are ignored.
  • Maximal 1 space behind the '=' is ignored.
Message IDs

'MID' is a hexadecimal number or one of the following special codes:

  • Instead of a MID 'Tct' or 'Uct' are allowed:
    • 'T' is a literal 'T' for a standard track.
    • 'U' is a literal 'U' for a battle track.
    • 'c' is the cup number (T=1..8, U=1..2).
    • 't' is the track index within the cup (T=1..4, U=1..5).
    • Both messages related to the specified track are defined.
  • Instead of a MID 'Mnn' is allowed:
    • 'M' is a literal 'M' for an online chat message.
    • 'nn' is a decimal number in the range 1..96.
    • The messages related to the specified message is defined.
Continuation lines

Text definitions can be splitted into multiple lines:

 + TEXT
  • Multiple continuation can be used.
  • Spaces and tabs before the '+' are ignored.
  • Maximal 1 space behind the '+' is ignored.
Syntax for TEXT
  • The text is scanned char by char assuming UTF-8. Invalid UTF-8 characters are interpreted as pure ASCII.
  • The sign '\' is a special character that starts an escape sequence. The kind of the escape determined by the following character:
    • '\\' : the backslash itself.
    • '\a' : ASCII 7 = 0x07 = BEL (bell)
    • '\b' : ASCII 8 = 0x08 = BS (back space)
    • '\f' : ASCII 12 = 0x0c = FF (form feed)
    • '\n' : ASCII 10 = 0x0a = LF (line feed)
    • '\r' : ASCII 13 = 0x0d = CR (carriage return)
    • '\t' : ASCII 9 = 0x09 = HT (horizontal tabulator)
    • '\v' : ASCII 11 = 0x0b = VT (vertical tabulator)
    • '\nnn' : The numeric code 'nnn' is an octal number up to 3 digits. Only characters with code #0 .. 511 (=0x1ff) are possible.
    • '\x{nnnnn}' : The numeric code 'nnnnn' is a hexadecimal number of any length.
    • '\z{xyy,zzzzzz}' : This is a short cut for Nintendos escape sequences.
      • 'x' is one of '6', '8', 'a' or 'c' and defines the byte length of the whole sequence.
      • 'y' is any byte code.
      • 'z' is a 16 (x=6), 32 (x=8),48 (x=a) or 64 (x=c) bit integer.
    • The data is stored as: \x{1a}\x{xyy} followed by the integer zz.
Example
#BMG-TXT  <<<  The first 8 characters are the magic for a BMG text file.
#         <<<  Don't remove them!

# Set the MID 1234 to "Hello"
 1234 = Hello

# Standard C escape sequences are allowed:
 1234 = Hello\nWiimm

# Continuation lines are also possible
 1234 = Hello\n
      + Wiimm
      
# Flags are specified with ~
 1234 ~ 0x01000000

# For track names use the MID alternative 'Tct'
# Both related messages will be defined.
 T11 = name of first track

Tools

The following tools can handle SZS files: