Difference between revisions of "YAZ0 (File Format)"
Line 14: | Line 14: | ||
| 0x00 || char[4] || always "Yaz0" | | 0x00 || char[4] || always "Yaz0" | ||
|- | |- | ||
− | | 0x04 || u32 || size of the | + | | 0x04 || u32 || size of the uncompressed data |
|- | |- | ||
| 0x08 || char[8] || always zero (padding) | | 0x08 || char[8] || always zero (padding) | ||
Line 32: | Line 32: | ||
=== Data Groups === | === Data Groups === | ||
− | The complete compressed data is organized in '''data groups'''. Each data group consists of 1 group header byte an 8 ''' | + | The complete compressed data is organized in '''data groups'''. Each data group consists of 1 group header byte an 8 '''chunks''': |
{| class="wikitable" | {| class="wikitable" | ||
Line 40: | Line 40: | ||
| 1 || 1 || the group header byte | | 1 || 1 || the group header byte | ||
|- | |- | ||
− | | 8 || 1-3 || 8 | + | | 8 || 1-3 || 8 chunks |
|} | |} | ||
− | Each bit of the group header corespondents to one | + | Each bit of the group header corespondents to one chunk: |
− | * The MSB (most significant bit, 0x80) corespondents to | + | * The MSB (most significant bit, 0x80) corespondents to chunk 1 |
− | * The LSB (lowest significant bit, 0x01) corespondents to | + | * The LSB (lowest significant bit, 0x01) corespondents to chunk 8 |
− | A set bit (=1) in the group header means, that the | + | A set bit (=1) in the group header means, that the chunk is 1 exact 1 byte long. This byte must be copied to the output stream 1:1. A cleared bit (=0) defines, that the chunk is 2 or 3 bytes long interpreted as a back reference to already decompressed data that must be copied. |
{| class="wikitable" | {| class="wikitable" | ||
Line 60: | Line 60: | ||
* <tt>RRR</tt> is a value between <tt>0x000</tt> and <tt>0xfff</tt>. Go back <tt>RRR+1</tt> bytes in the output stream to find the start of the data to be copied. | * <tt>RRR</tt> is a value between <tt>0x000</tt> and <tt>0xfff</tt>. Go back <tt>RRR+1</tt> bytes in the output stream to find the start of the data to be copied. | ||
* <tt>SIZE</tt> is the number of bytes to be copied. | * <tt>SIZE</tt> is the number of bytes to be copied. | ||
+ | * It is important to know the a chunk may reference to itself. For example if <tt>RRR=0</tt> and <tt>SIZE=10</tt> the previos byte is copied 10 times. | ||
− | Decoding data groups and | + | Decoding data groups and chunks are done until the end of the destination data is reached. |
== Examples == | == Examples == | ||
Line 95: | Line 96: | ||
const u8 b1 = *src++; | const u8 b1 = *src++; | ||
const u8 b2 = *src++; | const u8 b2 = *src++; | ||
− | const u8 * copy_src = dest - | + | const u8 * copy_src = dest - (( b1 & 0x0f ) << 8 | b2 ) - 1; |
− | |||
int n = b1 >> 4; | int n = b1 >> 4; | ||
Line 104: | Line 104: | ||
n += 2; | n += 2; | ||
ASSERT( n >= 3 && n <= 0x111 ); | ASSERT( n >= 3 && n <= 0x111 ); | ||
− | |||
− | if ( dest + n > dest_end ) | + | if ( copy_src < szs->data && dest + n > dest_end ) |
return ERROR("Corrupted data!\n"); | return ERROR("Corrupted data!\n"); | ||
+ | // don't use memcpy() or memmove() here because | ||
+ | // they don't work with self referencing chunks. | ||
while ( n-- > 0 ) | while ( n-- > 0 ) | ||
*dest++ = *copy_src++; | *dest++ = *copy_src++; |
Revision as of 15:02, 7 April 2011
Yaz0 is a run length encoding (RLE compression) method. In Mario Kart Wii most of the SZS files are Yaz0 compressed U8 files.
Data structure
Header
The header of a Yaz0 file is always 16 bytes long. All numeric values stored as big endian values.
Offset | Type | Description |
---|---|---|
0x00 | char[4] | always "Yaz0" |
0x04 | u32 | size of the uncompressed data |
0x08 | char[8] | always zero (padding) |
- GNU C example
typedef struct yaz0_header_t { char magic[4]; // always "Yaz0" be32_t uncompressed_size; // total size of uncompressed data char padding[8]; // always 0? } __attribute__ ((packed)) yaz0_header_t;
Data Groups
The complete compressed data is organized in data groups. Each data group consists of 1 group header byte an 8 chunks:
N | Size | Description |
---|---|---|
1 | 1 | the group header byte |
8 | 1-3 | 8 chunks |
Each bit of the group header corespondents to one chunk:
- The MSB (most significant bit, 0x80) corespondents to chunk 1
- The LSB (lowest significant bit, 0x01) corespondents to chunk 8
A set bit (=1) in the group header means, that the chunk is 1 exact 1 byte long. This byte must be copied to the output stream 1:1. A cleared bit (=0) defines, that the chunk is 2 or 3 bytes long interpreted as a back reference to already decompressed data that must be copied.
Size | 1.B | 2.B | 3.B | Comment |
---|---|---|---|---|
2 | NR | RR | — | N=1..f, SIZE=N+2 |
3 | 0R | RR | NN | N=00..ff, SIZE=N+0x12 |
- RRR is a value between 0x000 and 0xfff. Go back RRR+1 bytes in the output stream to find the start of the data to be copied.
- SIZE is the number of bytes to be copied.
- It is important to know the a chunk may reference to itself. For example if RRR=0 and SIZE=10 the previos byte is copied 10 times.
Decoding data groups and chunks are done until the end of the destination data is reached.
Examples
Decompression
- GNU C example
const u8 * src = // pointer to start of source const u8 * src_end = // pointer to end of source (last byte +1) u8 * dest = // pointer to start of destination u8 * dest_end = // pointer to end of destination (last byte +1) u8 code = 0; // code ... int code_len = 0; // ... and code_len used to manage groups while ( src < src_end && dest < dest_end ) { if (!code_len--) { code = *src++; code_len = 7; } if ( code & 0x80 ) { // copy 1 byte direct *dest++ = *src++; } else { // rle part const u8 b1 = *src++; const u8 b2 = *src++; const u8 * copy_src = dest - (( b1 & 0x0f ) << 8 | b2 ) - 1; int n = b1 >> 4; if (!n) n = *src++ + 0x12; else n += 2; ASSERT( n >= 3 && n <= 0x111 ); if ( copy_src < szs->data && dest + n > dest_end ) return ERROR("Corrupted data!\n"); // don't use memcpy() or memmove() here because // they don't work with self referencing chunks. while ( n-- > 0 ) *dest++ = *copy_src++; } code <<= 1; } ASSERT( src <= src_end ); ASSERT( dest <= dest_end );
This code example is taken from Wiimms SZS Tools: SVN repository lib-szs.c line 159