HAL DAT (File Format)

From Custom Mario Kart
Jump to: navigation, search
Under Construction
This article is not finished. Help improve it by adding accurate information or correcting grammar and spelling.


This page describes the HAL Labs HSD Archive (*.dat) file format, as found in these games:

Game Name
Super Smash Bros Melee
Kirby Air Ride
Mario Kart Arcade GP
Mario Kart Arcade GP 2
Kirby's Return to Dreamland
Wii Channel TV

These files can also be found in a variety of games under containers or compression. Games using these files in some way also use sysdolphin which was a library for the GameCube and Wii that allowed for rapid game development by providing useful libraries like handling things like models, textures, cameras among other things.

NOTE: this format's extreme simplicity makes it complicated to understand, and there is still quite a lot to be deciphered from it.

Beginner information

If you are unfamiliar with how game resource archives are built up, please read Xentax's Definitive Guide to Exploring File Formats. It contains all the information that you should know if you are interested in starting to understand how do the HAL DAT files work.

File Format

The HAL Labs. HSD-Archive (*.dat) format can be described as an unordered multi-root hierarchy (tree) of various data structures.

All pointers are relative to the Data Block (0x20) except for string pointers.

NOTE: pointers and offsets are the same thing, but the term "offset" is only used to describe the data at that location.

File Header

Offset Size Format Description
0x00 4 unsigned File Size
0x04 4 pointer Pointer Table Offset
0x08 4 unsigned Pointer Count
0x0C 4 unsigned Root Node Count
0x10 4 unsigned Reference Node Count
0x14 4 string Unknown. "001B" when used.
0x18 4 unsigned Unknown. Padding?
0x1C 4 unsigned Unknown. Padding?

Data Block

The Data Block consists of the data, structures, and string table in the archive.

String Table

The String Table is an array of 0-terminated strings which the string pointers in some structs point to directly.
This array is not aligned (or allocated), and the pointer table begins directly after the last 0-termination byte.

Pointer Table

The Pointer Table lists (points to) every valid (0x00000000 is invalid if not pointed to otherwise) pointer in the data block.
The order of the list is exactly the order of the pointers in the data block.

Offset Size Format Description
Pointer Table Offset Pointer Count * 4 pointer Pointer Offset

Root Nodes

This is where our journey through the data begins as these contain the data and string offsets.

Offset Size Format Description
0x00 4 pointer Data Offset
0x04 4 pointer String Offset (relative to String Table)

Reading the strings means reading until a stop character of '\x00'.

Reference Nodes

Yet to be seen.

Following these there is another String Table, which has no definite size and no padding.
The strings seem to be used as identifier keys for the data (similar to how a Python dict or JavaScript object identifies its data).


Identifying the Root Structure

Currently 2 methods have been used to decide what the root structure is.

The String Analysis Method

This simple method was designed by Revel8n and only applies to Super Smash Bros. Melee, which involves analyzing the string of the Root Node and searching for known keyword patterns to identify the root structure.

The Path-Finder Method

This complex method was designed by Tcll which tests structure paths, sizes, pointer locations, and pointer offsets against pre-defined (known) structures.
The structures that match all these standards are tried, and the path they are aligned to is discarded if anything fails (this method works with Super Smash Bros. Melee, Kirby Air-Ride, and recently Wii Channel TV, with more support to come).

Code: ( Python 2.7 / UMC-script v3.0 )
Full code: not available yet.

Improved method using a decorator class (less work to define structures):

Step 1: Create a dictionary which holds the info for the known structures.

   globals()['structs'] = {
   #   struct_name: [ expected_size, struct_function (for root structs) or _pass, {
   #       pointer_addr: sub-struct_name,
   #       ...
   #   }, isArray=False ],
   #   ...

Step 2: Define the decorator class which does half our work for us (registering the function name and function or _pass with the given info on init).

   class structPointers(object ):
       """registers structure definitions for the pathfinder
       - size (int): the size of the struct in bytes (-1 for undefined)
       - pointers (dict): location of pointer-value : expected struct name
       - isarray (bool): notifies the path-finder to multiply "size" to match the overflow before testing for false padding
       - root (bool): marks this struct as a root struct
       @structPointers( 8, { 0:'_pass', ... } )
       def struct2( offset ): ...
       @structPointers( 32, { 0:'root_struct1', 4:'struct2', ... }, root=True )
       def root_struct1( offset ): ...
       # NOTE: see structs below for more examples.
       def __init__( this, size, pointers={}, isArray=False, root=False ):
           this.size=size; this.pointers = pointers; this.isarray = isArray; this.root = root
       def __call__( this, struct_function ):
           globals()['structs'][struct_function.__name__] = [ this.size, struct_function if this.root else _pass , this.pointers, this.isarray ]
           return struct_function

Step 3: Define the structures like so (associate size, names with pointer locations, isArray, root, and set our function): Note: isArray is used to multiply the expected size along the given size and test for padding.

   @structPointers( 64, {
       0:'_pass', # string
       56:'_pass' # (IB matrix)
   }, root=True)
   def _bone(Bone_Offset, parent=None, prev=None, rig ='joint_obj'):
       # file operation goes here

Step 4: Gather and sort the pointers to each structure using the pointer table and root structure pointers.

   relocation_array = array(bu32,count=pointer_cnt); relocation_array.__color__ = 0x40FF40
   relocations = relocation_array(offset=pointer_tbl,label={' -- relocations':' -- pointer-address'})
   pointers = {pointer_tbl} # using a set to efficiently remove duplicate entries
   for addr in relocations:
       p = bu32( label=' -- Pointer' ); p.__color__ = 0xBBFFBB
   for i in range(root_cnt):
       p = bu32( label=' -- Root Pointer' ); p.__color__ = 0xFFDDBB

Step 5: Find a valid path

   def test_path(struct_name, given_size, struct_offset):
       """walker to validate the path recursively"""
       expected_size, func, ptrs, isArray = structs[struct_name]
       if expected_size == -1: return True # allow ignorance
       if given_size != expected_size:
           if given_size > expected_size:
               padSize = given_size%expected_size if isArray else given_size-expected_size
               if sum(array(bu8,count=padSize,offset=struct_offset+(given_size-padSize))( label=' -- pad-byte validity')):
                   print('      invalid path: given size %i > expected size %i for struct %s'%(given_size,expected_size,struct_name)); return False
           # this error shouldn't occur, but just in case, it's better to catch it anyway...
           else: print('      invalid path: given size %i < expected size %i for struct %s'%(given_size,expected_size,struct_name)); return False
       pid = 0
       jump(struct_offset, label=' -- validating struct %s'%struct_name)
       for i in range(expected_size): # !important: test each byte of the structure
           if i in ptrs:
               name = ptrs[i]
               location = bu32(label=' -- pointer %i of struct %s'%(pid,struct_name))+32
               pid += 1
               if pointer-32 not in relocations:
                   if location==32: continue # 0-pointer
                   else: print('      invalid path: 0-pointer expected, but found data at location %i for struct %s.'%(i,struct_name)); return False
               if location in pointers: size = min([address for address in pointers if address>location])-location
               else: print("      invalid path: couldn't determine size of sub-struct %s at location %i for struct %s."%(name,i,struct_name)); return False
               if not test_path(name, size, location): return False
           elif (struct_offset+i)-32 in relocations: print('      invalid path: pointer found in expected data-space at location %i for struct %s.'%(i,struct_name)); return False
       return True
   # TODO: test 2nd-priority root structures of non-fixed size, such as raw image data.
   # TODO: struct priority (2 structs have the same size, which do we use first?)
   # TODO: gather all root structs, and determine the order from what is given (matanim should be parsed before joint, but yet is supplied afterwards).
   # wisdom from Tcll: you can't trust anything for what it is in this format.
   for i in range(root_cnt):
       jump(pointer_tbl+(pointer_cnt*4)+(i*8), label=' -- Root Structs') # get to the root node
       root_offset = bu32(label=' -- Data Offset')+32
       string_offset = bu32(label=' -- string Offset') # could be a dictionary key: { str(key): value }
       root_size = min([address for address in pointers if address>root_offset])-root_offset # a root pointer should never be the last pointer.
       # ^ note: this is the given size, which could be larger than the expected size.
       # resolve the given root size into a dictionary of possible root structures (by size)
       roots = {root_size:[]} # { 48; ['root5', ... ], 44: ['root8', ... ] } (valid sizes determined from if sum(padding)==0)
       # ^ initial root size included to prevent pad-scanning issues.
       for structname, (structsize, structfunc, structptrs, isArray) in structs.items():
           if structfunc!=_pass: # root structures always have their own function
               if 0<structsize<=root_size:
                   for _structsize in roots:
                       if structsize==_structsize: roots[structsize].append(structname)
                       jump(root_offset+structsize, label=' -- validating root struct oversize')
                       pad_bytes = array(bu8, count=root_size-structsize)( label=' -- initial pad-byte validity' )
                       if sum(pad_bytes)==0: roots[structsize] = [structname]
       found = False
       rl = len(roots)
       print('\nfound %i possible size categor%s for root %i of size %i:'%(rl, 'y' if rl == 1 else 'ies', i, root_size))
       for _size, root_names in roots.items():
           rs = len(root_names)
           if root_names:
               print('  scanning %i possible root struct%s of size %i.'%(rs,  if rs == 1 else 's', _size))
               for ni,root_name in enumerate(root_names,1):
                   print('    %i: %s'%(ni, root_name))
                   if test_path(root_name, _size, root_offset):
                       print('      found a valid path, attempting to parse...')
                       # noinspection PyBroadException
                       try: structs[root_name][1](root_offset); print('      parsing succeeded'); found=True; break
                           import sys,traceback # TODO: remove
                           print('      parsing failed, reason:\n'); traceback.print_exception( *sys.exc_info() ); print()
               if found: break
               else: print('    could not find a valid path.')
       if not found: print("could not find anything out of what's currently known.")
       # Tcll - possible test: if index has something to do with selection

Extra documentation:

   # Tcll - What does this do?
   # Well to put it into perspective (if it looks like a duck and quacks like a duck, then it must be a duck),
   # all we are given is a list of pointer addresses (relocations), and a root pointer (aside from a useless string address).
   # We don't know what this mysterious root pointer points to, so we have to guess,
   # but first we need to collect some information to give us an accurate guess...
   # So we resolve the pointer addresses into a list of struct addresses (or pointers to structs),
   # and add any root pointers to that list (saving the pointer addresses for testing if an expected pointer exists).
   # Now we can make a good guess as to the size of our root struct, as well as the sizes of its children.
   # For some added information, what we have done with the decorator-class above (structPointers),
   # is build a collection of pre-defined structs with their sizes, (relative) pointer locations, and associate functions.
   # Now what we do with our given and expected information is put them together to determine our result.
   # First we start with the root size and gather structures matching or less than the given size.
   # Once we have a struct, if the size is smaller than the given size, we test that the extended data is pad-bytes (0s).
   # If valid, we now test that the expected pointer locations are valid (also searching for invalid pointers).
   # (this is why we saved our pointer addresses mentioned above)
   # Finally, if those are valid, we test that the structs at those pointers are valid,
   # following the same recursive routine until we hit a 'pass'.
   # (which marks either data, something unknown, or a struct with an undetermined size)
   # NOTE: ignorant tests with struct sizes of -1 are in place for unknown data and 'pass'.
   # NOTE: testing arrays of structures is done by modulo-ing (%) the given struct size by the associated expected size,
   # and using the remainder to test for pad-bytes (yes, modulo is the remainder of divide (/)).
   # If that is valid, test the first structure of the array.
   # If all tests are positive, we can call our associated root-struct function.
   # NOTE: (is it really a duck?) just because all tests are positive does not mean we have a valid structure-path.
   # Note that the testing done is only based on what is known, so certain things can easily look like,
   # and be mistaken for other things without enough collected information (like the current matanim struct for example).
   # Unfortunately there is not much of a discrete way to validate the data in structs without doing something overly complex,
   # meaning testing would take hours due to the verbosity of the testing depth.
   # So it is left up to an exception to catch the invalid data and attempt to try another root struct.
   # This can be a double-edged sword though as the path may be correct,
   # while the invalid data is just something left to be figured out, causing the path-finder to choke.
   # (at this point, any invalid data added to UGE's backend is not removed, the data is simply dealt with until finished)

Structure Layout

Currently, only a little is known about the full structure layout, but here is what is known so far.

Super Smash Bros. Melee

Mesh layout: (SSBM Pl*.dat, Ty*.dat)
└ Bone
  ├ Bone
  └ Object
    ├ Material
    │ ├ Colors
    │ └ Texture
    │   ├ Image
    │   │ └ (Image Data)
    │   ├ Palette
    │   │ └ (Palette Data)
    │   ├ Unknown1
    │   └ Texture
    ├ Mesh
    │ ├ Attributes
    │ │ └ (Vector Data)
    │ ├ Influence Matrix Array
    │ │ └ Weight Array
    │ │   └ Weight
    │ └ Display List
    │   └ (Sub-Vector Data (if any) and/or Indexes)
    └ Object

Structure Definitions

Super Smash Bros. Melee

Bone Structures (Root Structure)

Offset Size Format Description
0x00 4 pointer String Offset (typically unused)
0x04 4 unsigned Unknown Flags.
0x08 4 pointer Child Bone Struct
0x0C 4 pointer Next Bone Struct
0x10 4 pointer Object Struct
0x14 4 float Rotation X
0x18 4 float Rotation Y
0x1C 4 float Rotation Z
0x20 4 float Scale X
0x24 4 float Scale Y
0x28 4 float Scale Z
0x2C 4 float Location X
0x30 4 float Location Y
0x34 4 float Location Z
0x38 4 pointer Inverse World Bind Matrix (use parent matrix if 0)
0x3C 4 pointer Unknown.

Matrix (3x4)

Offset Size Format Description
0x00 4 float Rotation 1 1
0x04 4 float Rotation 1 2
0x08 4 float Rotation 1 3
0x0C 4 float Translation X
0x10 4 float Rotation 2 1
0x14 4 float Rotation 2 2
0x18 4 float Rotation 2 3
0x1C 4 float Translation Y
0x20 4 float Rotation 3 1
0x24 4 float Rotation 3 2
0x28 4 float Rotation 3 3
0x2C 4 float Translation Z

Object Structures

Offset Size Format Description
0x00 4 pointer String Offset (typically unused)
0x04 4 pointer Next Object Struct
0x08 4 pointer Material Struct
0x0C 4 pointer Mesh Struct

Material Structures

Offset Size Format Description
0x00 4 pointer String Offset (typically unused)
0x04 4 unsigned Unknown Render Mode Flags
0x08 4 pointer Texture Struct
0x0C 4 pointer Color Struct
0x10 4 pointer Unknown Render Struct
0x14 4 pointer Pixel Processing Struct

Texture Structures

Offset Size Format Description
0x00 4 pointer String Offset (typically unused)
0x04 4 pointer Next Texture Struct
0x08 4 unsigned GXTexMapID
0x0C 4 unsigned GXTexGenSrc
0x10 4 float Rotation X
0x14 4 float Rotation Y
0x18 4 float Rotation Z
0x1C 4 float Scale X
0x20 4 float Scale Y
0x24 4 float Scale Z
0x28 4 float Translation X
0x2C 4 float Translation Y
0x30 4 float Translation Z
0x34 4 unsigned Wrap S
0x38 4 unsigned Wrap T
0x3C 1 unsigned Repeat S
0x3D 1 unsigned Repeat T
0x3E 2 unsigned Padding
0x40 4 unsigned Unknown Flags.
0x44 4 float Blending
0x48 4 unsigned Mag Filter (GXTexFilter)
0x4C 4 pointer Image Struct
0x50 4 pointer Palette Struct
0x54 4 pointer LOD Struct
0x58 4 pointer TEV Struct

Image Structures

Offset Size Format Description
0x00 4 pointer Data Offset
0x04 2 unsigned Width
0x06 2 unsigned Height
0x08 4 unsigned Format
Format Stride Description
0x0 0x1 (2 pixels) I4
0x1 0x1 I8
0x2 0x1 IA4
0x3 0x2 IA8
0x4 0x2 RGB565
0x5 0x2 RGB5A3

NOTE: may follow RGB4A3 instead of A3RGB4

0x6 0x4 RGBA8
0x8 0x1 (2 pixels) CI4 (Color Index 4-bit)
0x9 0x1 CI8
0xA 0x2 CI14x2
0xE 0x8 (16 pixels)
(2-color palette)
0x0C 4 unsigned Mipmap (GXBool)
0x10 4 float Min LOD
0x14 4 float Max LOD

Palette Structures

Offset Size Format Description
0x00 4 pointer Data Offset (typically 0)
0x04 4 unsigned Format (GXTlutFmt)
Format Stride Description
0x0 0x2 IA8
0x1 0x2 RGB565
0x2 0x2 RGB5A3

NOTE: may follow RGB4A3 instead of A3RGB4

0x08 4 unsigned Name (GXTlut)
0x0C 2 unsigned Color Count
0x0E 2 unsigned Padding

LOD Structure

Offset Size Format Description
0x00 4 unsigned Min Filter (GXTexFilter)
0x04 4 float LOD Bias
0x08 1 unsigned Bias Clamp (GXBool)
0x09 1 unsigned Edge LOD Enable (GXBool)
0x0A 2 unsigned Padding
0x0C 4 unsigned Max Anisotropy (GXAnisotropy)

TEV Structures

Offset Size Format Description
0x00 4 pointer Unknown Offset? Typically 0.
0x04 4 unsigned Unknown. Flags?
0x08 4 unsigned Unknown. Flags?
0x0C 1 unsigned Unknown.
0x0D 1 unsigned Unknown.
0x0E 1 unsigned Unknown.
0x0F 1 unsigned Unknown.
0x10 16 padding?

Color Structures

Offset Size Format Description
0x00 1*4 unsigned RGBA Diffuse
0x04 1*4 unsigned RGBA Ambient
0x08 1*4 unsigned RGBA Specular
0x0C 4 float Transparency (1.0 = opaque)
0x10 4 float Glossiness

Pixel Processing Structures

Offset Size Format Description
0x00 1 unsigned Flags
0x01 1 unsigned Alpha Ref0
0x02 1 unsigned Alpha Ref1
0x03 1 unsigned Destination Alpha
0x04 1 unsigned Type (GXBlendMode)
0x05 1 unsigned Source Factor (GXBlendFactor)
0x06 1 unsigned Destination Factor (GXBlendFactor)
0x07 1 unsigned Blend Op (GXLogicOp)
0x08 1 unsigned Depth Function (GXCompare)
0x09 1 unsigned Alpha Comp0 (GXCompare)
0x0A 1 unsigned Alpha Op (GXAlphaOp)
0x0B 1 unsigned Alpha Comp1 (GXCompare)

Mesh Structures

Offset Size Format Description
0x00 4 pointer String Offset (typically unused)
0x04 4 pointer Next Mesh Struct
0x08 4 pointer Mesh Attribute Struct Array

(parse until a CP_ID of 0xFF)

0x0C 2 unsigned Unknown Flags.
0x0E 2 unsigned Display List size *32
0x10 4 pointer Display List Data Offset
0x14 4 pointer Position/Normal Influence Matrix Array

(parse the array until 0x00000000)

Mesh Attribute Structures

Data: (HexEdit-styled)
    00 00 00 09  00 00 00 03  00 00 00 01  00 00 00 03
    0B 00 00 06  00 00 00 00

Offset Size Format Description
0x00 4 unsigned CP_ID
Enum Description
0x0 Position/Normal Influence Matrix ID
0x1 UV[0] Influence Matrix ID
0x2 UV[1] Influence Matrix ID
0x3 UV[2] Influence Matrix ID
0x4 UV[3] Influence Matrix ID
0x5 UV[4] Influence Matrix ID
0x6 UV[5] Influence Matrix ID
0x7 UV[6] Influence Matrix ID
0x8 UV[7] Influence Matrix ID
0x9 Vertex ID/Value
0xA Normal ID/Value
0xB Color[0] ID/Value
0xC Color[1] ID/Value
0xD UV[0] ID/Value
0xE UV[1] ID/Value
0xF UV[2] ID/Value
0x10 UV[3] ID/Value
0x11 UV[4] ID/Value
0x12 UV[5] ID/Value
0x13 UV[6] ID/Value
0x14 UV[7] ID/Value
0x15 Vertex Influence Matrix Array Offset
0x16 Normal Influence Matrix Array Offset
0x17 UV Influence Matrix Array Offset
0x18 Light Influence Matrix Array Offset
0x19 NBT ID/Value
0x04 4 unsigned Component Type
Enum Description
0x0 None
0x1 Direct (value instead of index)
0x2 Index 8-bit
0x3 Index 16-bit
0x08 4 unsigned Component Count
Enum Description
0x0 XY Position
0x1 XYZ Position
0x0 Normal
0x1 Normal Bi-normal Tangent
0x2 Normal or Bi-normal or Tangent
0x0 RGB Color
0x1 RGBA Color
0x0 S Coord
0x1 ST Coord
0x0C 4 unsigned Data Type
Enum Description
0x0 unsigned 8-bit
0x1 signed 8-bit
0x2 unsigned 16-bit
0x3 signed 16-bit
0x4 float

NOTE: if not float, the vector component result is calculated from value / pow( 2.0, Exponent )

Color Format:

Enum Description
0x0 RGB565 (Red 5-bit, Green 6-bit, Blue 5-bit)
0x1 RGB8(88) (Red 8-bit, Green 8-bit, Blue 8-bit)
0x2 RGBX8(888) (Red 8-bit, Green 8-bit, Blue 8-bit, Discarded 8-bit)
0x3 RGBA4(444) (Red 4-bit, Green 4-bit, Blue 4-bit, Alpha 4-bit)
0x4 RGBA6(666) (Red 6-bit, Green 6-bit, Blue 6-bit, Alpha 6-bit)
0x5 RGBA8(888) (Red 8-bit, Green 8-bit, Blue 8-bit, Alpha 8-bit)
0x10 1 unsigned Divisor (Floating Point Exponent for int data types)
0x11 1 unsigned Unknown.
0x12 2 unsigned Stride
0x14 4 pointer Data Offset

Position/Normal Influence Matrix Array

Offset Size Format Description
0x00 4 pointer Weight Struct Array (see data below)
0x## 4 end 0x00000000

Data: (HexEdit-styled)
    00 00 F3 70  3F 80 00 00  00 00 00 00  00 00 00 00

Weight Structure

Offset Size Format Description
0x00 4 pointer Bone Struct Offset
0x04 4 float Weight

The Bone Struct Offset is used to dereference the already existing bone struct in memory to get it's world bind matrix and multiply it with it's inverse world bind matrix.
The product of the multiplication is added to a zero matrix, which makes up a single influence matrix in the above array.

The P/N Influence array is used when a mesh uses CP attribute 0, where the first value in a facepoint/vertex is divided by 3 to get the matrix index.


The animation data still has yet to be documented, but can be found in Pl**Aj.dat files. These files alone are archives containing DAT file data, in which those "files" contain the animation data.

More structures to be documented.

Kirby Air-Ride

Unknown_2 Structures (Root Structure)

Offset Size Format Description
0x00 4 pointer Unknown Offset.
0x04 4 pointer Unknown_3 Struct
0x08 4 pointer Unknown_4 Struct Offset
0x0C 4 pointer Unknown. Matrix
0x10 4 pointer Unknown_5 Struct Offset.
0x14 4 pointer Unknown Offset.
0x18 4 pointer Unknown_6 Struct Offset.
0x1C 20 Unknown. Padding?

Unknown_3 Structures

Offset Size Format Description
0x00 4 pointer Unknown Single Bone Struct.
0x04 4 unsigned Unknown.
0x08 4 unsigned Unknown.
0x0C 4 unsigned Unknown.
0x10 4 pointer Unknown_7 Struct Offset (attributes?).
0x14 4 pointer Unknown_7 Struct Offset (attributes?).
0x18 4 pointer Unknown_7 Struct Offset (attributes?).
0x1C 4 pointer Unknown_7 Struct Offset (attributes?).
0x20 4 pointer Unknown_7 Struct Offset (attributes?).
0x24 4 pointer Unknown_7 Struct Offset (attributes?).
0x28 4 pointer Bone Struct

More structures to be documented.


HexEdit 5.0 - an advanced hex editor with template support.

To make this download work with colored data highlights you'll need to patch BinaryFileFormat.dtd with the one from HexEdit Pro here:

HAL_DAT template for HexEdit 5.0
Not Available, a new template is in the works.


(old) http://smashboards.com/threads/melee-dat-format.292603/