Difference between revisions of "YAZ0 (File Format)"

Latest revision as of 19:43, 30 May 2021

Yaz0 is a run length encoding (RLE compression) method. In Mario Kart Wii most of the SZS files are Yaz0 compressed U8 files. See »File format: U8 archives« for details.

Data structure

Header

The header of a Yaz0 file is always 16 bytes long. All numeric values are stored as big endian.

Offset	Type	Description
0x00	Char[4]	magic. Always Yaz0 in ASCII.
0x04	UInt32	Size in bytes of the uncompressed data.
0x08	UInt32[2]	Reserved for special use. Always 0 in Mario Kart Wii.

GNU C example

typedef struct yaz0_header_t
{
    char	magic[4];		// always "Yaz0"
    be32_t	uncompressed_size;	// total size of uncompressed data
    be32_t	reserved[2];		// two unsigned integers reserved for special use
}
__attribute__ ((packed)) yaz0_header_t;

Data Groups

The complete compressed data is organized in data groups. Each data group consists of 1 group header byte and 8 chunks.

N	Size	Description
1	1 byte	the group header byte
8	1-3 bytes	8 chunks

Each bit of the group header corresponds to one chunk:

The MSB (most significant bit, 0x80) corresponds to chunk 1
The LSB (lowest significant bit, 0x01) corresponds to chunk 8

A set bit (=1) in the group header means, that the chunk is exact 1 byte long. This byte must be copied to the output stream 1:1. A cleared bit (=0) defines, that the chunk is 2 or 3 bytes long interpreted as a backreference to already decompressed data that must be copied.

Size	Data Bytes	Size Calculation
2 bytes	`NR RR`	`N = 1..f`	SIZE = N+2 (=3..0x11)
3 bytes	`0R RR NN`	`N = 00..ff`	SIZE = N+0x12 (=0x12..0x111)

RRR is a value between 0x000 and 0xfff. Go back RRR+1 bytes in the output stream to find the start of the data to be copied.
SIZE is calculated from N (see above) and declares the number of bytes to be copied.
It is important to know, that a chunk may reference itself. For example if RRR=1 (go back 1+1=2) and SIZE=10 the previous 2 bytes are copied 10/2=5 times.

Decoding data groups and chunks is done until the end of the destination data is reached.

Examples

Decompression

GNU C example

const u8 * src      = // pointer to start of source
const u8 * src_end  = // pointer to end of source (last byte +1)
u8 * dest           = // pointer to start of destination
u8 * dest_end       = // pointer to end of destination (last byte +1)

u8  group_head      = 0; // group header byte ...
int group_head_len  = 0; // ... and it's length to manage groups

while ( src < src_end && dest < dest_end )
{
    if (!group_head_len)
    {
	//*** start a new data group and read the group header byte.

        group_head = *src++;
        group_head_len = 8;
    }

    group_head_len--;
    if ( group_head & 0x80 )
    {
        //*** bit in group header byte is set -> copy 1 byte direct

        *dest++ = *src++;
    }
    else
    {
        //*** bit in group header byte is not set -> run length encoding

	// read the first 2 bytes of the chunk
        const u8 b1 = *src++;
        const u8 b2 = *src++;
        
	// calculate the source position
        const u8 * copy_src = dest - (( b1 & 0x0f ) << 8 | b2 ) - 1;

	// calculate the number of bytes to copy.
        int n = b1 >> 4;

        if (!n)
            n = *src++ + 0x12; // N==0 -> read third byte
        else
            n += 2; // add 2 to length
        ASSERT( n >= 3 && n <= 0x111 );

	// a validity check
        if ( copy_src < szs->data || dest + n > dest_end )
            return ERROR("Corrupted data!\n");

	// copy chunk data.
        // don't use memcpy() or memmove() here because
        // they don't work with self referencing chunks.
        while ( n-- > 0 )
            *dest++ = *copy_src++;
    }

    // shift group header byte
    group_head <<= 1;
}

// some assertions to find errors in debugging mode
ASSERT( src <= src_end );
ASSERT( dest <= dest_end );

This code example is taken from Wiimms SZS Tools: SVN repository lib-szs.c line 162

Tools

The following tools can handle compressed U8 files (=SZS files):

CTools Pack, by MrBean35000vr and Chadderz
SZS Modifier, by MrBean35000vr and Chadderz
Wexos's Toolbox, by Wexos
Wiimms SZS Tools, by Wiimm

Wexos's Toolbox and Wiimms SZS Tools can (de)compress any kind of Yaz0-compressed files. CTools and SZS Modifier can only handle U8 files.

@@ Line 1: / Line 1: @@
-'''Yaz0''' is a run length encoding (RLE compression) method. In [[Mario Kart Wii]] most of the [[SZS]] files are '''Yaz0 compressed [[U8]] files.'''
+'''Yaz0''' is a run length encoding (RLE compression) method. In [[Mario Kart Wii]] most of the [[SZS]] files are '''Yaz0 compressed [[U8]] files'''. See »[[File_format#U8_archives|File format: U8 archives]]« for details.
+__TOC__
 == Data structure ==
 === Header ===
+The header of a Yaz0 file is always 16 bytes long. All numeric values are stored as [[big endian]].
-The header of a Yaz0 file is always 16 bytes long. All numeric values stored as [[ big endian]] values.
 {| class="wikitable"
 |-
-! Offset
+! Offset !! Type !! Description
-! Type
-! Description
 |-
-| 0x00 || char[4] || always "Yaz0"
+| 0x00 || Char[4] || '''magic'''. Always ''Yaz0'' in ASCII.
 |-
-| 0x04 || u32 || size of the uncompressed data
+| 0x04 || UInt32 || Size in bytes of the uncompressed data.
 |-
-| 0x08 || char[8] || always zero (padding)
+| 0x08 || UInt32[2] || Reserved for special use. Always 0 in [[Mario Kart Wii]].
 |}
@@ Line 25: / Line 26: @@
      char magic[4]; // always "Yaz0"
      be32_t uncompressed_size; // total size of uncompressed data
-     char padding[8]; // always 0?
+     be32_t reserved[2]; // two unsigned integers reserved for special use
 }
 __attribute__ ((packed)) yaz0_header_t;
@@ Line 31: / Line 32: @@
 === Data Groups ===
+The complete compressed data is organized in '''data groups'''. Each data group consists of 1 group header byte and 8 '''chunks'''.
-The complete compressed data is organized in '''data groups'''. Each data group consists of 1 group header byte an 8 '''chunks''':
 {| class="wikitable"
@@ Line 38: / Line 38: @@
 ! N !! Size !! Description
 |-
-| 1 || 1 || the group header byte
+| 1 || align=center | 1 byte || the group header byte
 |-
-| 8 || 1-3 || 8 chunks
+| 8 || 1-3 bytes || 8 chunks
 |}
-Each bit of the group header corespondents to one chunk:
+Each bit of the group header corresponds to one chunk:
-* The MSB (most significant bit, 0x80) corespondents to chunk 1
+* The MSB (most significant bit, 0x80) corresponds to chunk 1
-* The LSB (lowest significant bit, 0x01) corespondents to chunk 8
+* The LSB (lowest significant bit, 0x01) corresponds to chunk 8
-A set bit (=1) in the group header means, that the chunk is 1 exact 1 byte long. This byte must be copied to the output stream 1:1. A cleared bit (=0) defines, that the chunk is 2 or 3 bytes long interpreted as a back reference to already decompressed data that must be copied.
+A set bit (=1) in the group header means, that the chunk is exact 1 byte long. This byte must be copied to the output stream 1:1. A cleared bit (=0) defines, that the chunk is 2 or 3 bytes long interpreted as a backreference to already decompressed data that must be copied.
 {| class="wikitable"
 |-
-! Size !! 1.B !! 2.B !! 3.B !! Comment
+! Size !! Data Bytes !! colspan=2 | Size Calculation
 |-
-| 2 || <tt>NR</tt> || <tt>RR</tt> || &mdash; || <tt>N=1..f, SIZE=N+2</tt>
+| 2 bytes || <tt>NR RR</tt> || <tt>N = 1..f || SIZE = N+2 (=3..0x11)</tt>
 |-
-| 3 || <tt>0R</tt> || <tt>RR</tt> || <tt>NN</tt> || <tt>N=00..ff, SIZE=N+0x12</tt>
+| 3 bytes || <tt>0R RR NN</tt> || <tt>N = 00..ff || SIZE = N+0x12 (=0x12..0x111)</tt>
 |}
-* <tt>RRR</tt> is a value between <tt>0x000</tt> and <tt>0xfff</tt>. Go back <tt>RRR+1</tt> bytes in the output stream to find the start of the data to be copied.
+* '''<tt>RRR</tt>''' is a value between <tt>0x000</tt> and <tt>0xfff</tt>. Go back <tt>RRR+1</tt> bytes in the output stream to find the start of the data to be copied.
-* <tt>SIZE</tt> is the number of bytes to be copied.
+* '''<tt>SIZE</tt>''' is calculated from '''<tt>N</tt>''' (see above) and declares the number of bytes to be copied.
-* It is important to know the a chunk may reference to itself. For example if <tt>RRR=0</tt> and <tt>SIZE=10</tt> the previos byte is copied 10 times.
+* It is important to know, that a chunk may reference itself. For example if <tt>RRR=1</tt> (go back 1+1=2) and <tt>SIZE=10</tt> the previous 2 bytes are copied 10/2=5 times.
-Decoding data groups and chunks are done until the end of the destination data is reached.
+Decoding data groups and chunks is done until the end of the destination data is reached.
 == Examples ==
 === Decompression ===
 ;GNU C example
 <pre>
@@ Line 74: / Line 73: @@
 u8 * dest_end      = // pointer to end of destination (last byte +1)
-u8  code            = 0; // code ...
+u8  group_head      = 0; // group header byte ...
-int code_len        = 0; // ... and code_len used to manage groups
+int group_head_len  = 0; // ... and it's length to manage groups
 while ( src < src_end && dest < dest_end )
 {
-     if (!code_len--)
+     if (!group_head_len)
      {
-        code = *src++;
+ //*** start a new data group and read the group header byte.
-        code_len = 7;
-    }
-     if ( code & 0x80 )
+        group_head = *src++;
+        group_head_len = 8;
+    }
+    group_head_len--;
+     if ( group_head & 0x80 )
      {
-         // copy 1 byte direct
+         //*** bit in group header byte is set -> copy 1 byte direct
          *dest++ = *src++;
      }
      else
      {
-         // rle part
+         //*** bit in group header byte is not set -> run length encoding
-         const u8  b1 = *src++;
+ // read the first 2 bytes of the chunk
-         const u8  b2 = *src++;
+         const u8 b1 = *src++;
+         const u8 b2 = *src++;
+ // calculate the source position
          const u8 * copy_src = dest - (( b1 & 0x0f ) << 8 | b2 ) - 1;
-         int n = b1 >> 4;
+ // calculate the number of bytes to copy.
-         if (!n)
+         int n = b1 >> 4;
-             n = *src++ + 0x12;
+         if (!n)
+             n = *src++ + 0x12; // N==0 -> read third byte
          else
-             n += 2;
+             n += 2; // add 2 to length
          ASSERT( n >= 3 && n <= 0x111 );
-         if ( copy_src < szs->data && dest + n > dest_end )
+ // a validity check
+         if ( copy_src < szs->data || dest + n > dest_end )
              return ERROR("Corrupted data!\n");
+ // copy chunk data.
          // don't use memcpy() or memmove() here because
          // they don't work with self referencing chunks.
@@ Line 113: / Line 123: @@
              *dest++ = *copy_src++;
      }
-     code <<= 1;
+     // shift group header byte
+    group_head <<= 1;
 }
+// some assertions to find errors in debugging mode
 ASSERT( src <= src_end );
 ASSERT( dest <= dest_end );
 </pre>
-This code example is taken from [[Wiimms SZS Tools]]: SVN repository [http://opensvn.wiimm.de/viewvc/wii/trunk/wiimms-szs-tools/src/lib-szs.c?annotate=2437#l159 lib-szs.c line 159]
+This code example is taken from [[Wiimms SZS Tools]]: SVN repository [http://opensvn.wiimm.de/viewvc/wii/trunk/wiimms-szs-tools/src/lib-szs.c?annotate=2498#l162 lib-szs.c line 162]
+== Tools ==
+The following tools can handle compressed U8 files (=SZS files):
+* [[CTools Pack]], by [[MrBean35000vr]] and [[Chadderz]]
+* [[SZS Modifier]], by [[MrBean35000vr]] and [[Chadderz]]
+* [[Wexos's Toolbox]], by [[Wexos]]
+* [[Wiimms SZS Tools]], by [[Wiimm]]
+[[Wexos's Toolbox]] and [[Wiimms SZS Tools]] can (de)compress any kind of Yaz0-compressed files. [[CTools]] and [[SZS Modifier]] can only handle [[U8]] files.
-[[category:File Format]]
+[[Category:File Format/Other]]