Difference between revisions of "YAZ0 (File Format)"

From Custom Mario Kart
Jump to navigation Jump to search
 
(22 intermediate revisions by 5 users not shown)
Line 1: Line 1:
'''Yaz0''' is a run length encoding (RLE compression) method. In [[Mario Kart Wii]] most of the [[SZS]] files are '''Yaz0 compressed [[U8]] files.'''
+
'''Yaz0''' is a run length encoding (RLE compression) method. In [[Mario Kart Wii]] most of the [[SZS]] files are '''Yaz0 compressed [[U8]] files'''. See »[[File_format#U8_archives|File format: U8 archives]]« for details.
 +
 
 +
 
 +
__TOC__
 +
 
  
 
== Data structure ==
 
== Data structure ==
 
=== Header ===
 
=== Header ===
 
+
The header of a Yaz0 file is always 16 bytes long. All numeric values are stored as [[big endian]].
The header of a Yaz0 file is always 16 bytes long. All numeric values stored as [[ big endian]] values.
 
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Offset
+
! Offset !! Type !! Description
! Type
 
! Description
 
 
|-
 
|-
| 0x00 || char[4] || always "Yaz0"
+
| 0x00 || Char[4] || '''magic'''. Always ''Yaz0'' in ASCII.
 
|-
 
|-
| 0x04 || u32 || size of the uncompressed data
+
| 0x04 || UInt32 || Size in bytes of the uncompressed data.
 
|-
 
|-
| 0x08 || char[8] || always zero (padding)
+
| 0x08 || UInt32[2] || Reserved for special use. Always 0 in [[Mario Kart Wii]].
 
|}
 
|}
  
Line 25: Line 26:
 
     char magic[4]; // always "Yaz0"
 
     char magic[4]; // always "Yaz0"
 
     be32_t uncompressed_size; // total size of uncompressed data
 
     be32_t uncompressed_size; // total size of uncompressed data
     char padding[8]; // always 0?
+
     be32_t reserved[2]; // two unsigned integers reserved for special use
 
}
 
}
 
__attribute__ ((packed)) yaz0_header_t;
 
__attribute__ ((packed)) yaz0_header_t;
Line 31: Line 32:
  
 
=== Data Groups ===
 
=== Data Groups ===
 
+
The complete compressed data is organized in '''data groups'''. Each data group consists of 1 group header byte and 8 '''chunks'''.
The complete compressed data is organized in '''data groups'''. Each data group consists of 1 group header byte an 8 '''chunks''':
 
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 38: Line 38:
 
! N !! Size !! Description
 
! N !! Size !! Description
 
|-
 
|-
| 1 || 1 || the group header byte
+
| 1 || align=center | 1 byte || the group header byte
 
|-
 
|-
| 8 || 1-3 || 8 chunks
+
| 8 || 1-3 bytes || 8 chunks
 
|}
 
|}
  
Each bit of the group header corespondents to one chunk:
+
Each bit of the group header corresponds to one chunk:
* The MSB (most significant bit, 0x80) corespondents to chunk 1
+
* The MSB (most significant bit, 0x80) corresponds to chunk 1
* The LSB (lowest significant bit, 0x01) corespondents to chunk 8
+
* The LSB (lowest significant bit, 0x01) corresponds to chunk 8
  
A set bit (=1) in the group header means, that the chunk is 1 exact 1 byte long. This byte must be copied to the output stream 1:1. A cleared bit (=0) defines, that the chunk is 2 or 3 bytes long interpreted as a back reference to already decompressed data that must be copied.
+
A set bit (=1) in the group header means, that the chunk is exact 1 byte long. This byte must be copied to the output stream 1:1. A cleared bit (=0) defines, that the chunk is 2 or 3 bytes long interpreted as a backreference to already decompressed data that must be copied.
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Size !! 1.B !! 2.B !! 3.B !! Comment
+
! Size !! Data Bytes !! colspan=2 | Size Calculation
 
|-
 
|-
| 2 || <tt>NR</tt> || <tt>RR</tt> || &mdash; || <tt>N=1..f, SIZE=N+2</tt>
+
| 2 bytes || <tt>NR RR</tt> || <tt>N = 1..f || SIZE = N+2 (=3..0x11)</tt>
 
|-
 
|-
| 3 || <tt>0R</tt> || <tt>RR</tt> || <tt>NN</tt> || <tt>N=00..ff, SIZE=N+0x12</tt>
+
| 3 bytes || <tt>0R RR NN</tt> || <tt>N = 00..ff || SIZE = N+0x12 (=0x12..0x111)</tt>
 
|}
 
|}
  
* <tt>RRR</tt> is a value between <tt>0x000</tt> and <tt>0xfff</tt>. Go back <tt>RRR+1</tt> bytes in the output stream to find the start of the data to be copied.
+
* '''<tt>RRR</tt>''' is a value between <tt>0x000</tt> and <tt>0xfff</tt>. Go back <tt>RRR+1</tt> bytes in the output stream to find the start of the data to be copied.
* <tt>SIZE</tt> is the number of bytes to be copied.
+
* '''<tt>SIZE</tt>''' is calculated from '''<tt>N</tt>''' (see above) and declares the number of bytes to be copied.
* It is important to know the a chunk may reference to itself. For example if <tt>RRR=0</tt> and <tt>SIZE=10</tt> the previos byte is copied 10 times.
+
* It is important to know, that a chunk may reference itself. For example if <tt>RRR=1</tt> (go back 1+1=2) and <tt>SIZE=10</tt> the previous 2 bytes are copied 10/2=5 times.
  
Decoding data groups and chunks are done until the end of the destination data is reached.
+
Decoding data groups and chunks is done until the end of the destination data is reached.
  
 
== Examples ==
 
== Examples ==
 
=== Decompression ===
 
=== Decompression ===
 
 
;GNU C example
 
;GNU C example
 
<pre>
 
<pre>
Line 74: Line 73:
 
u8 * dest_end      = // pointer to end of destination (last byte +1)
 
u8 * dest_end      = // pointer to end of destination (last byte +1)
  
u8  code            = 0; // code ...
+
u8  group_head      = 0; // group header byte ...
int code_len        = 0; // ... and code_len used to manage groups
+
int group_head_len  = 0; // ... and it's length to manage groups
  
while ( src < src_end && dest < dest_end )  
+
while ( src < src_end && dest < dest_end )
{  
+
{
     if (!code_len--)
+
     if (!group_head_len)
     {  
+
     {
        code = *src++;
+
//*** start a new data group and read the group header byte.
        code_len = 7;
 
    }
 
  
     if ( code & 0x80 )
+
        group_head = *src++;
 +
        group_head_len = 8;
 +
    }
 +
 
 +
    group_head_len--;
 +
     if ( group_head & 0x80 )
 
     {
 
     {
         // copy 1 byte direct
+
         //*** bit in group header byte is set -> copy 1 byte direct
 +
 
 
         *dest++ = *src++;
 
         *dest++ = *src++;
 
     }
 
     }
     else  
+
     else
     {  
+
     {
         // rle part
+
         //*** bit in group header byte is not set -> run length encoding
  
         const u8 b1 = *src++;
+
// read the first 2 bytes of the chunk
         const u8 b2 = *src++;
+
         const u8 b1 = *src++;
 +
         const u8 b2 = *src++;
 +
       
 +
// calculate the source position
 
         const u8 * copy_src = dest - (( b1 & 0x0f ) << 8 | b2 ) - 1;
 
         const u8 * copy_src = dest - (( b1 & 0x0f ) << 8 | b2 ) - 1;
  
         int n = b1 >> 4;  
+
// calculate the number of bytes to copy.
         if (!n)  
+
         int n = b1 >> 4;
             n = *src++ + 0x12;  
+
 
 +
         if (!n)
 +
             n = *src++ + 0x12; // N==0 -> read third byte
 
         else
 
         else
             n += 2;
+
             n += 2; // add 2 to length
 
         ASSERT( n >= 3 && n <= 0x111 );
 
         ASSERT( n >= 3 && n <= 0x111 );
  
         if ( copy_src < szs->data && dest + n > dest_end )
+
// a validity check
 +
         if ( copy_src < szs->data || dest + n > dest_end )
 
             return ERROR("Corrupted data!\n");
 
             return ERROR("Corrupted data!\n");
  
 +
// copy chunk data.
 
         // don't use memcpy() or memmove() here because
 
         // don't use memcpy() or memmove() here because
 
         // they don't work with self referencing chunks.
 
         // they don't work with self referencing chunks.
Line 113: Line 123:
 
             *dest++ = *copy_src++;
 
             *dest++ = *copy_src++;
 
     }
 
     }
     code <<= 1;
+
 
 +
     // shift group header byte
 +
    group_head <<= 1;
 
}
 
}
 +
 +
// some assertions to find errors in debugging mode
 
ASSERT( src <= src_end );
 
ASSERT( src <= src_end );
 
ASSERT( dest <= dest_end );
 
ASSERT( dest <= dest_end );
 
</pre>
 
</pre>
This code example is taken from [[Wiimms SZS Tools]]: SVN repository [http://opensvn.wiimm.de/viewvc/wii/trunk/wiimms-szs-tools/src/lib-szs.c?annotate=2437#l159 lib-szs.c line 159]
+
This code example is taken from [[Wiimms SZS Tools]]: SVN repository [http://opensvn.wiimm.de/viewvc/wii/trunk/wiimms-szs-tools/src/lib-szs.c?annotate=2498#l162 lib-szs.c line 162]
 +
 
 +
== Tools ==
 +
The following tools can handle compressed U8 files (=SZS files):
 +
* [[CTools Pack]], by [[MrBean35000vr]] and [[Chadderz]]
 +
* [[SZS Modifier]], by [[MrBean35000vr]] and [[Chadderz]]
 +
* [[Wexos's Toolbox]], by [[Wexos]]
 +
* [[Wiimms SZS Tools]], by [[Wiimm]]
 +
 
 +
[[Wexos's Toolbox]] and [[Wiimms SZS Tools]] can (de)compress any kind of Yaz0-compressed files. [[CTools]] and [[SZS Modifier]] can only handle [[U8]] files.
  
[[category:File Format]]
+
[[Category:File Format/Other]]

Latest revision as of 19:43, 30 May 2021

Yaz0 is a run length encoding (RLE compression) method. In Mario Kart Wii most of the SZS files are Yaz0 compressed U8 files. See »File format: U8 archives« for details.



Data structure

Header

The header of a Yaz0 file is always 16 bytes long. All numeric values are stored as big endian.

Offset Type Description
0x00 Char[4] magic. Always Yaz0 in ASCII.
0x04 UInt32 Size in bytes of the uncompressed data.
0x08 UInt32[2] Reserved for special use. Always 0 in Mario Kart Wii.
GNU C example
typedef struct yaz0_header_t
{
    char	magic[4];		// always "Yaz0"
    be32_t	uncompressed_size;	// total size of uncompressed data
    be32_t	reserved[2];		// two unsigned integers reserved for special use
}
__attribute__ ((packed)) yaz0_header_t;

Data Groups

The complete compressed data is organized in data groups. Each data group consists of 1 group header byte and 8 chunks.

N Size Description
1 1 byte the group header byte
8 1-3 bytes 8 chunks

Each bit of the group header corresponds to one chunk:

  • The MSB (most significant bit, 0x80) corresponds to chunk 1
  • The LSB (lowest significant bit, 0x01) corresponds to chunk 8

A set bit (=1) in the group header means, that the chunk is exact 1 byte long. This byte must be copied to the output stream 1:1. A cleared bit (=0) defines, that the chunk is 2 or 3 bytes long interpreted as a backreference to already decompressed data that must be copied.

Size Data Bytes Size Calculation
2 bytes NR RR N = 1..f SIZE = N+2 (=3..0x11)
3 bytes 0R RR NN N = 00..ff SIZE = N+0x12 (=0x12..0x111)
  • RRR is a value between 0x000 and 0xfff. Go back RRR+1 bytes in the output stream to find the start of the data to be copied.
  • SIZE is calculated from N (see above) and declares the number of bytes to be copied.
  • It is important to know, that a chunk may reference itself. For example if RRR=1 (go back 1+1=2) and SIZE=10 the previous 2 bytes are copied 10/2=5 times.

Decoding data groups and chunks is done until the end of the destination data is reached.

Examples

Decompression

GNU C example
const u8 * src      = // pointer to start of source
const u8 * src_end  = // pointer to end of source (last byte +1)
u8 * dest           = // pointer to start of destination
u8 * dest_end       = // pointer to end of destination (last byte +1)

u8  group_head      = 0; // group header byte ...
int group_head_len  = 0; // ... and it's length to manage groups

while ( src < src_end && dest < dest_end )
{
    if (!group_head_len)
    {
	//*** start a new data group and read the group header byte.

        group_head = *src++;
        group_head_len = 8;
    }

    group_head_len--;
    if ( group_head & 0x80 )
    {
        //*** bit in group header byte is set -> copy 1 byte direct

        *dest++ = *src++;
    }
    else
    {
        //*** bit in group header byte is not set -> run length encoding

	// read the first 2 bytes of the chunk
        const u8 b1 = *src++;
        const u8 b2 = *src++;
        
	// calculate the source position
        const u8 * copy_src = dest - (( b1 & 0x0f ) << 8 | b2 ) - 1;

	// calculate the number of bytes to copy.
        int n = b1 >> 4;

        if (!n)
            n = *src++ + 0x12; // N==0 -> read third byte
        else
            n += 2; // add 2 to length
        ASSERT( n >= 3 && n <= 0x111 );

	// a validity check
        if ( copy_src < szs->data || dest + n > dest_end )
            return ERROR("Corrupted data!\n");

	// copy chunk data.
        // don't use memcpy() or memmove() here because
        // they don't work with self referencing chunks.
        while ( n-- > 0 )
            *dest++ = *copy_src++;
    }

    // shift group header byte
    group_head <<= 1;
}

// some assertions to find errors in debugging mode
ASSERT( src <= src_end );
ASSERT( dest <= dest_end );

This code example is taken from Wiimms SZS Tools: SVN repository lib-szs.c line 162

Tools

The following tools can handle compressed U8 files (=SZS files):

Wexos's Toolbox and Wiimms SZS Tools can (de)compress any kind of Yaz0-compressed files. CTools and SZS Modifier can only handle U8 files.