Difference between revisions of "BMG (File Format)"

From Custom Mario Kart
Jump to navigation Jump to search
m (→‎Text Index Table (Info): forgot the one u8)
 
(12 intermediate revisions by 6 users not shown)
Line 1: Line 1:
A '''BMG''' file is a message file. Each message is referenced by a message id (MID).  
+
== Overview ==
 
+
A '''BMG''' file is a message file. Each message is referenced by a message ID (MID).  
 
 
__TOC__
 
  
 +
== Raw BMG File ==
 +
The message file of [[MKW]] consists of a header and three data sections:
 +
* The offset table into the string pool with attributes (section »INF1«).
 +
* The string pool (section »DAT1«).
 +
* The list of related message IDs (section »MID1«). This list is omitted in some games.
  
== Raw BMG File ==
+
Other games use some additional sections:
The the message file consists of a header and three data sections:
+
* Section »FLW1«.
* The offset table into the string pool.
+
* Section »FLI1«.
* The string pool with 16 bit wide characters.
 
* The list of related message ids (MID).
 
  
 
=== Header ===
 
=== Header ===
Line 17: Line 18:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Offset !! Type !! Description
+
! Offset !! Name !! Type !! Description
 +
|-
 +
| 0x00 || <tt>signature</tt> || Char[4] || Part of '''file magic'''. Always ''MESG'' in ASCII.
 +
|-
 +
| 0x04 || <tt>dataType</tt> || Char[4] || Part of '''file magic'''. Always ''bmg1'' in ASCII.
 +
|-
 +
| 0x08 || <tt>dataSize</tt> || UInt32 || '''Length of the file''' in bytes. If Encoding is 0, then it is an old BMG format used in GameCube games. In this case, the file is block aligned (32 bytes each block) and this member is the number of blocks.
 +
|-
 +
| 0x0C || <tt>numBlocks</tt> || UInt32 || '''Number of sections''': Usually 3, and sometimes 2 outside from MKWii.
 
|-
 
|-
| 0x00 || Char[8] || '''File magic'''. Always ''MESGbmg1'' in ASCII.
+
| 0x10 || <tt>charset</tt> || byte || '''Encoding''': 0=Undefined, 1=OneByte (CP1252), 2=TwoBytes (UTF-16), 3=SJIS (Shift-JIS), 4=UTF8. UTF16 in MKWii. Value 0 is used for an old BMG format used by GameCube games and the encoding is presumably CP1252.
 
|-
 
|-
| 0x08 || UInt32 || Length of the file in bytes.
+
| 0x11 || <tt>reserved0</tt> || byte || '''Unknown'''
 
|-
 
|-
| 0x0C || UInt32 || Number of sections. Usually 3, and sometimes 2 outside from MKWii.
+
| 0x12 || <tt>reserved1</tt> || short || '''Unknown'''
 
|-
 
|-
| 0x10 || byte || Encoding. 1=CP1252, 2=UTF-16, 3=Shift-JIS, 4=UTF-8. UTF16 in MKWii.
+
| 0x14 || <tt>reserved</tt> || int[2] || '''Unknown'''
 
|-
 
|-
| 0x11 || Byte[15] || {{Unknown-left|'''Unknown'''. Probably padding.}}
+
| 0x1c || <tt>userWork</tt> || int || '''Unknown'''
 
|}
 
|}
  
Line 35: Line 44:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Offset !! Type !! Description
+
! Offset !! Name !! Type !! Description
 
|-
 
|-
| 0x00 || Char[4] || Section magic.
+
| 0x00 || <tt>kind</tt> || Char[4] || Section magic.
 
|-
 
|-
| 0x04 || UInt32 || Length of the section in bytes.
+
| 0x04 || <tt>size</tt> || UInt32 || Length of the section in bytes.
 
|}
 
|}
 +
 +
Sections usually aligned to 32 bytes. In some [[Wii U]] games, the header of the last section (FLI1) tells the aligned size, but the real size is shorter (cut to total size defined by the file header).
  
 
;GNU C example
 
;GNU C example
Line 57: Line 68:
 
typedef struct bmg_header_t
 
typedef struct bmg_header_t
 
{
 
{
    char           magic[8];       // magic "MESGbmg1"
+
/*00*/  char magic[8]; // = BMG_MAGIC
    be32_t         size;           // total size of file
+
/*08*/  be32_t size; // total size of file
    be32_t         n_section;     // number of sub sections
+
/*0c*/  be32_t n_sections; // number of sections
    be32_t        unknown1;       // = 0x02000000
+
/*10*/  bmg_encoding_t encoding; // text encoding
    char          unknown2[0xc]; // unknown data
+
/*11*/  u8 unknown[15]; // unknown data
    bmg_section_t section[0];     // first sub header
+
/*20*/  bmg_section_t section[0]; // first section header
 
}
 
}
 
__attribute__ ((packed)) bmg_header_t;
 
__attribute__ ((packed)) bmg_header_t;
 +
 
</pre>
 
</pre>
  
=== Text Index Table ===
+
=== Text Index Table (Info) ===
The '''text index table''' is usually the first section of a BMG file.
+
The '''text index table''' (Info) is usually the first section of a BMG file.
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Offset !! Type<br/>Size !! Description
+
! Offset !! Name !! Type<br/>Size !! Description
 +
|-
 +
| 0x00 || <tt>kind</tt> || Char[4] || '''Section magic'''. Always ''INF1'' in ASCII.
 +
|-
 +
| 0x04 || <tt>size</tt> || UInt32 || Length of the section in bytes.
 
|-
 
|-
| 0x00 || Char[4] || '''Section magic'''. Always ''INF1'' in ASCII.
+
| 0x08 || <tt>numEntries</tt> || UInt16 || '''N''' = Number of messages.
 
|-
 
|-
| 0x04 || UInt32 || Length of the section in bytes.
+
| 0x0A || <tt>entrySize</tt> || UInt16 || '''LEN''' = The length of each item, always 8 in MKWii.
 
|-
 
|-
| 0x08 || UInt16 || '''N''' = Number of messages.
+
| 0x0C || <tt>groupID</tt> || UInt16 || '''BMG file ID''' = ID for this BMG file. Usually 0.
 
|-
 
|-
| 0x0A || UInt16 || '''LEN''' = The length of each item, always 8 in MKWii.
+
| 0x0E || <tt>defaultColor</tt> || byte || Default color index.
 
|-
 
|-
| 0x0C || UInt32 || {{Unknown-left|'''Unknown'''. Probably padding, usually 0.}}
+
| 0x0F || <tt>reserved</tt> || byte || '''Unknown'''
 
|-
 
|-
| 0x10 || '''N''' * '''LEN''' || '''N''' items of the length '''LEN'''.
+
| 0x10 || <tt>messageEntry[]</tt> || '''N''' * '''LEN''' || '''N''' items of the length '''LEN'''.
 
|}
 
|}
  
Line 97: Line 113:
 
|}
 
|}
  
In [[Mario Kart Wii]] each item has 2 data members:
+
In [[Mario Kart Wii]], each item has 2 data members:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
Line 104: Line 120:
 
| 0x00 || UInt32 || Offset into text pool of the ''DAT1'' section.
 
| 0x00 || UInt32 || Offset into text pool of the ''DAT1'' section.
 
|-
 
|-
| 0x04 || UInt32 || An attribute. The highest byte is used for font selection:
+
| 0x04 || byte || The font to use:
* '''0x00000000''' : Value used for count down and for final race strings like "FINISH!".
+
* '''0x00''' : Value used for count down and for final race strings like "FINISH!".
* '''0x01000000''' : Standard value for nearly all messages.
+
* '''0x01''' : Standard value for nearly all messages.
* '''0x04000000''' : Use special red font (only digits) used for battle and team points.
+
* '''0x04''' : Use special red font (only digits) used for battle and team points.
* '''0x05000000''' : Use special blue font (only digits) used for battle and team points.
+
* '''0x05''' : Use special blue font (only digits) used for battle and team points.
 
Other values are not used by [[Mario Kart Wii]] and most will freeze the game.
 
Other values are not used by [[Mario Kart Wii]] and most will freeze the game.
 +
|-
 +
| 0x05 || byte[3] || Padding
 
|}
 
|}
  
Line 126: Line 144:
 
typedef struct bmg_inf_t
 
typedef struct bmg_inf_t
 
{
 
{
     char          magic[4];       // magic "INF1"
+
     char          kind[4];       // magic "INF1"
 
     be32_t        size;          // total size of the section
 
     be32_t        size;          // total size of the section
     be16_t        n_msg;         // number of messages
+
     be16_t        numEntries;     // number of messages
     be16_t        inf_size;       // size of inf items
+
     be16_t        entrySize;     // size of inf items
     be32_t         unknown;       // = 0
+
     be16_t         groupID;
 +
    be8_t          defaultColor;
 +
    be8_t          reserved;
 
     bmg_inf_item_t list[0];        // data
 
     bmg_inf_item_t list[0];        // data
 
}
 
}
Line 136: Line 156:
 
</pre>
 
</pre>
  
=== String Pool ===
+
=== String Pool (Data) ===
 
The second section contains the string pool.
 
The second section contains the string pool.
 
Strings are sequences of 16-bit values (like UTF-16 or Windows wide char) terminated by a NULL value.
 
Strings are sequences of 16-bit values (like UTF-16 or Windows wide char) terminated by a NULL value.
Line 166: Line 186:
  
 
=== Message IDs ===
 
=== Message IDs ===
The third section contains the table with the message ids (MID).
+
The third section contains the table with the message IDs (MID).
 
The number of elements of this table is equal to element number of the text index table.
 
The number of elements of this table is equal to element number of the text index table.
 
Elements with the same table index are attributes for the same string.
 
Elements with the same table index are attributes for the same string.
Line 174: Line 194:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Offset !! Type !! Description
+
! Offset !! Name !! Type !! Description
 +
|-
 +
| 0x00 || <tt>kind</tt> || Char[4] || '''Section magic'''. Always ''MID1'' in ASCII.
 
|-
 
|-
| 0x00 || Char[4] || '''Section magic'''. Always ''MID1'' in ASCII.
+
| 0x04 || <tt>size</tt> || UInt32 || Length of the section in bytes.
 
|-
 
|-
| 0x04 || UInt32 || Length of the section in bytes.
+
| 0x08 || <tt>numEntries</tt> || UInt16 || '''N''' = Number of messages.
 
|-
 
|-
| 0x08 || UInt16 || '''N''' = Number of messages.
+
| 0x0A || <tt>format</tt> || byte || Two four-bit blocks.  
 
|-
 
|-
| 0x0A || UInt16 || {{Unknown-left|'''Unknown'''. Usually 0x1000.}}
+
| 0x0B || <tt>info</tt> || byte || Defines how the two four-bit blocks above should be interpreted.
 
|-
 
|-
| 0x0C || UInt32 || {{Unknown-left|'''Unknown'''. Probably padding, usually 0.}}
+
| 0x0C || <tt>Reserve</tt> || byte[4] || {{Unknown-left|'''Unknown'''. Probably padding, usually 0.}}
 
|-
 
|-
 
| 0x10 || UInt32['''N'''] || '''N''' message IDs (MID).
 
| 0x10 || UInt32['''N'''] || '''N''' message IDs (MID).
Line 200: Line 222:
 
     be16_t        unknown1;      // = 0x1000
 
     be16_t        unknown1;      // = 0x1000
 
     be32_t        unknown2;      // = 0
 
     be32_t        unknown2;      // = 0
     u32            mid[0];        // message id table
+
     u32            mid[0];        // message ID table
 
}
 
}
 
__attribute__ ((packed)) bmg_mid_t;
 
__attribute__ ((packed)) bmg_mid_t;
Line 215: Line 237:
 
| 0x00 || 001A || Start of the sequence. 0x1A is also known as '''CTRL-Z''' or '''ASCII SUB''' (substitute).
 
| 0x00 || 001A || Start of the sequence. 0x1A is also known as '''CTRL-Z''' or '''ASCII SUB''' (substitute).
 
|-
 
|-
| 0x02 || xxyy ||
+
| 0x02 || xx || Total size in bytes of the escape sequence. Only even values are possible.  
* '''xx''' is the total size in bytes of the escape sequence. Only values 06, 08, 0A and 0C are seen at MKWii. Some other games use longer sequences, but only even numbers (06, 08, 0A, ..., FE ) are possible.
 
* '''yy''' is something like an usage ID.
 
 
|-
 
|-
| 0x04 || nnnn || The binary value ([[big endian]]). The size depends on the '''xx''' above:
+
| 0x03 || yy zz zz || Message tag ID. Consists of a tag group number (yy) and a tag number (zzzz).  
* '''xx=06:''' nnnn is a 16 bit value
 
* '''xx=08:''' nnnn is a 32 bit value split into 2 16-bit words
 
* '''xx=0A:''' nnnn is a 48 bit value split into 3 16-bit words
 
* '''xx=0C:''' nnnn is a 64 bit value split into 4 16-bit words
 
 
|}
 
|}
 +
 +
=== Tag number list for tag group FF ===
 +
 +
0000: Color (single byte)
 +
0001: Font size (two bytes, in percent)
 +
0002: Ruby character support (https://en.wikipedia.org/wiki/Ruby_character)
 +
0003: Font (single byte font number)
 +
0004: JMessage (string expansion)
 +
0005: JMessage (extract message data)
 +
0006: JMessage (message data destination)
 +
0007: Extended ruby character support
  
 
Here are some well known escape sequences:
 
Here are some well known escape sequences:
Line 232: Line 259:
 
! Hex Values || Description
 
! Hex Values || Description
 
|-
 
|-
| <tt>1A 602 0000</tt> || Name of the current player.
+
| <tt>1A 06 02 0000</tt> || Name of the current player.
 +
|-
 +
| <tt>1A 08 00 0000 00xx </tt>
 +
| Set the font size to xx percent.
 +
|-
 +
| <tt>1A 08 00 0001 00xx </tt>
 +
| Set the text color ('''xx:''' 00=gray, 02=white, 40+20+32=red, 30=yellow, 33=green, 21+31=blue, 08=transparent)
 
|-
 
|-
| <tt>1A 800 0001 00xx </tt>
+
| <tt>1A 08 01 xxxx xxxx </tt> || Unicode character <tt>U+xxxxxxxx</tt>.
| Set the text color ('''xx:''' 00=grey, 02=white, 40+20+32=red, 30=yellow, 33=green, 21+31=blue, 08=transparent)
 
 
|-
 
|-
| <tt>1A 801 xxxx xxxx </tt> || Unicode character <tt>U+xxxxxxxx</tt>.
+
| <tt>1A 08 02 0011 0000 </tt> || Time in mm:ss.xxx format.
 
|-
 
|-
| <tt>1A 802 0012 0000 </tt> || Name of a player.
+
| <tt>1A 08 02 0012 0000 </tt> || Name of a player.
 
|-
 
|-
| <tt>1A A02 0010 000x 000y </tt> || Context dependent integer with (at least) #y digits. #x is a zero based index into the integer parameter list.
+
| <tt>1A 0A 02 0010 000x 000y </tt> || Context dependent integer with (at least) #y digits. #x is a zero based index into the integer parameter list.
 
|}
 
|}
  

Latest revision as of 08:08, 4 July 2022

Overview

A BMG file is a message file. Each message is referenced by a message ID (MID).

Raw BMG File

The message file of MKW consists of a header and three data sections:

  • The offset table into the string pool with attributes (section »INF1«).
  • The string pool (section »DAT1«).
  • The list of related message IDs (section »MID1«). This list is omitted in some games.

Other games use some additional sections:

  • Section »FLW1«.
  • Section »FLI1«.

Header

In BMG files all numbers are stored in big endian format. Each BMG file starts with a BMG header:

Offset Name Type Description
0x00 signature Char[4] Part of file magic. Always MESG in ASCII.
0x04 dataType Char[4] Part of file magic. Always bmg1 in ASCII.
0x08 dataSize UInt32 Length of the file in bytes. If Encoding is 0, then it is an old BMG format used in GameCube games. In this case, the file is block aligned (32 bytes each block) and this member is the number of blocks.
0x0C numBlocks UInt32 Number of sections: Usually 3, and sometimes 2 outside from MKWii.
0x10 charset byte Encoding: 0=Undefined, 1=OneByte (CP1252), 2=TwoBytes (UTF-16), 3=SJIS (Shift-JIS), 4=UTF8. UTF16 in MKWii. Value 0 is used for an old BMG format used by GameCube games and the encoding is presumably CP1252.
0x11 reserved0 byte Unknown
0x12 reserved1 short Unknown
0x14 reserved int[2] Unknown
0x1c userWork int Unknown

The header is followed by the sections. Each section starts with an section type independent part:

Offset Name Type Description
0x00 kind Char[4] Section magic.
0x04 size UInt32 Length of the section in bytes.

Sections usually aligned to 32 bytes. In some Wii U games, the header of the last section (FLI1) tells the aligned size, but the real size is shorter (cut to total size defined by the file header).

GNU C example
#define BMG_MAGIC      "MESGbmg1"
#define BMG_MAGIC_NUM  0x4d455347626d6731ull

typedef struct bmg_section_t
{
    char           magic[4];       // a magic to identify the section
    be32_t         size;           // total size of the section
    u8             data[0];        // section data
}
__attribute__ ((packed)) bmg_section_t;

typedef struct bmg_header_t
{
 /*00*/  char		magic[8];	// = BMG_MAGIC
 /*08*/  be32_t		size;		// total size of file
 /*0c*/  be32_t		n_sections;	// number of sections
 /*10*/  bmg_encoding_t	encoding;	// text encoding
 /*11*/  u8		unknown[15];	// unknown data
 /*20*/  bmg_section_t	section[0];	// first section header
}
__attribute__ ((packed)) bmg_header_t;

Text Index Table (Info)

The text index table (Info) is usually the first section of a BMG file.

Offset Name Type
Size
Description
0x00 kind Char[4] Section magic. Always INF1 in ASCII.
0x04 size UInt32 Length of the section in bytes.
0x08 numEntries UInt16 N = Number of messages.
0x0A entrySize UInt16 LEN = The length of each item, always 8 in MKWii.
0x0C groupID UInt16 BMG file ID = ID for this BMG file. Usually 0.
0x0E defaultColor byte Default color index.
0x0F reserved byte Unknown
0x10 messageEntry[] N * LEN N items of the length LEN.

Format of each item:

Offset Type
Size
Description
0x00 UInt32 Offset into text pool of the DAT1 section.
0x04 LEN - 4 Attributes. The real length varies, but is equal for the same file.

In Mario Kart Wii, each item has 2 data members:

Offset Type Description
0x00 UInt32 Offset into text pool of the DAT1 section.
0x04 byte The font to use:
  • 0x00 : Value used for count down and for final race strings like "FINISH!".
  • 0x01 : Standard value for nearly all messages.
  • 0x04 : Use special red font (only digits) used for battle and team points.
  • 0x05 : Use special blue font (only digits) used for battle and team points.

Other values are not used by Mario Kart Wii and most will freeze the game.

0x05 byte[3] Padding
GNU C example
#define BMG_INF_MAGIC      "INF1"
#define BMG_INF_STD_FLAGS  0x01000000

typedef struct bmg_inf_item_t
{
    be32           text_index;     // offset into text pool
    be32           attribute;      // attribute of the string, 0x01000000 is standard
}
__attribute__ ((packed)) bmg_inf_item_t;

typedef struct bmg_inf_t
{
    char           kind[4];        // magic "INF1"
    be32_t         size;           // total size of the section
    be16_t         numEntries;     // number of messages
    be16_t         entrySize;      // size of inf items
    be16_t         groupID; 
    be8_t          defaultColor; 
    be8_t          reserved;
    bmg_inf_item_t list[0];        // data
}
__attribute__ ((packed)) bmg_inf_t;

String Pool (Data)

The second section contains the string pool. Strings are sequences of 16-bit values (like UTF-16 or Windows wide char) terminated by a NULL value. But NULL values are also possible in the middle of the string, if they are part of a 0x1A escape sequence.

Offset Type Description
0x00 Char[4] Section magic. Always »DAT1« in ASCII.
0x04 UInt32 The total size of the section. Always a multiple of 4 and usually a multiple of 32.
0x08 Char16[*] The string pool.
GNU C example
#define BMG_DAT_MAGIC  "DAT1"

typedef struct bmg_dat_t
{
    char           magic[4];       // magic "DAT1"
    be32_t         size;           // total size of the section
    u8             text_pool[];
}
__attribute__ ((packed)) bmg_dat_t;

Message IDs

The third section contains the table with the message IDs (MID). The number of elements of this table is equal to element number of the text index table. Elements with the same table index are attributes for the same string.

The start menu of the Wii uses similar BMG files, but without this »MID1« section. Without »MID1«, a BMG file is only an iteration of strings instead of a numerical indexed array.

Offset Name Type Description
0x00 kind Char[4] Section magic. Always MID1 in ASCII.
0x04 size UInt32 Length of the section in bytes.
0x08 numEntries UInt16 N = Number of messages.
0x0A format byte Two four-bit blocks.
0x0B info byte Defines how the two four-bit blocks above should be interpreted.
0x0C Reserve byte[4] Unknown. Probably padding, usually 0.
0x10 UInt32[N] N message IDs (MID).
GNU C example
#define BMG_MID_MAGIC "MID1"

typedef struct bmg_mid_t
{
    char           magic[4];       // magic "MID1"
    be32_t         size;           // total size of the section
    be16_t         n_msg;          // number of messages
    be16_t         unknown1;       // = 0x1000
    be32_t         unknown2;       // = 0
    u32            mid[0];         // message ID table
}
__attribute__ ((packed)) bmg_mid_t;

0x1A Escape Sequences

Nintendo uses sequences starting with the 16 bit value 0x1A to embed binary data in the strings. This binary data may contain NULL values that normally used as end of string marker.

Offset Hex Value Description
0x00 001A Start of the sequence. 0x1A is also known as CTRL-Z or ASCII SUB (substitute).
0x02 xx Total size in bytes of the escape sequence. Only even values are possible.
0x03 yy zz zz Message tag ID. Consists of a tag group number (yy) and a tag number (zzzz).

Tag number list for tag group FF

0000: Color (single byte) 0001: Font size (two bytes, in percent) 0002: Ruby character support (https://en.wikipedia.org/wiki/Ruby_character) 0003: Font (single byte font number) 0004: JMessage (string expansion) 0005: JMessage (extract message data) 0006: JMessage (message data destination) 0007: Extended ruby character support

Here are some well known escape sequences:

Hex Values Description
1A 06 02 0000 Name of the current player.
1A 08 00 0000 00xx Set the font size to xx percent.
1A 08 00 0001 00xx Set the text color (xx: 00=gray, 02=white, 40+20+32=red, 30=yellow, 33=green, 21+31=blue, 08=transparent)
1A 08 01 xxxx xxxx Unicode character U+xxxxxxxx.
1A 08 02 0011 0000 Time in mm:ss.xxx format.
1A 08 02 0012 0000 Name of a player.
1A 0A 02 0010 000x 000y Context dependent integer with (at least) #y digits. #x is a zero based index into the integer parameter list.

For more escape sequences see the List of found 0x1A escape sequences.

BMG files in Mario Kart Wii

In Mario Kart Wii BMG files are only found in the language dependent SZS files of directory /Scene/UI. All language dependent files have an underscore and a language letter before .szs, like Event_G.szs. All BMG files can be found in a subdirectory of the SZS named /messages/. The file names are: Common.bmg, Menu.bmg, Number.bmg, Race.bmg and StaffRoll.bmg.

The messages are repeated in the different files. All messages with the same message ID of the same language have always the same text, no differences between the text files can be found.

Wiimms BMG Text File

Wiimm has specified a text representation of the BMG files for his SZS Tools. The idea is to make editing and creating string tables much easier for humans and scripts. Wiimms SZS Tools accept binary and text BMG files as input and can convert each format to each other. Moreover BMG files of both formats can be used to patch other BMG files of both formats.

Syntax and Semantics of BMG text files

Messages of Mario Kart Wii

Here you can find all messages of the supported languages of Mario Kart Wii as text files.

Tools

The following tools can handle BMG files:

Only Wiimms SZS Tools can handle BMG-Text files.