HAL DAT (File Format)
Introduction
This page describes the HAL Labs DAT file format, as found in games such as Super Smash Bros Melee or Kirby Air Ride. It is also used in Mario Kart Arcade GP and Mario Kart Arcade GP 2 as well as in Kirby's Return to Dreamland, and even in Wii Channel TV. These files can also be found in a variety of games under containers or compression. Games using these files in some way also use sysdolphin which was a library for the GameCube and Wii that allowed for rapid game development by providing useful libraries like handling things like models, textures, cameras among other things.
NOTE: this format's extreme simplicity makes it complicated to understand, and there is still quite a lot to be deciphered from it.
Beginner information
If you are unfamiliar with how game resource archives are built up, please read Xentax's Definitive Guide to Exploring File Formats. It contains all the information that you should know if you are interested in starting to understand how do the HAL DAT files work.
File Format
The HAL Labs. HSD-Archive (*.dat) format can be described as an unordered multi-root hierarchy (tree) of various data structures.
All pointers are relative to the Data Block (0x20) except for string pointers.
NOTE: pointers and offsets are the same thing, but the term "offset" is only used to describe the data at that location.
File Header
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | unsigned | File Size |
0x04 | 4 | pointer | Pointer Table Offset |
0x08 | 4 | unsigned | Pointer Count |
0x0C | 4 | unsigned | Root Node Count |
0x10 | 4 | unsigned | Reference Node Count |
0x14 | 4 | string | Unknown. "001B" when used. |
0x18 | 4 | unsigned | Unknown. Padding? |
0x1C | 4 | unsigned | Unknown. Padding? |
Data Block
The Data Block consists of the data, structures, and string table in the archive.
String Table
The String Table is an array of 0-terminated strings which the string pointers in some structs point to directly.
This array is not aligned (or allocated), and the pointer table begins directly after the last 0-termination byte.
Pointer Table
The Pointer Table lists (points to) every valid (0x00000000 is invalid if not pointed to otherwise) pointer in the data block.
The order of the list is exactly the order of the pointers in the data block.
Offset | Size | Format | Description |
---|---|---|---|
Pointer Table Offset | Pointer Count * 4 | pointer | Pointer Offset |
Root Nodes
This is where our journey through the data begins as these contain the data and string offsets.
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | Data Offset |
0x04 | 4 | pointer | String Offset (relative to String Table) |
Reading the strings means reading until a stop character of '\x00'.
Reference Nodes
Yet to be seen.
Following these there is another String Table, which has no definite size and no padding.
The strings seem to be used as identifier keys for the data (similar to how a Python dict or JavaScript object identifies its data).
Structures
Identifying the Root Structure
Currently 2 methods have been used to decide what the root structure is.
The String Analysis Method
This simple method was designed by Revel8n and only applies to Super Smash Bros. Melee, which involves analyzing the string of the Root Node and searching for known keyword patterns to identify the root structure.
The Path-Finder Method
This complex method was designed by Tcll which tests structure paths, sizes, pointer locations, and pointer offsets against pre-defined (known) structures.
The structures that match all these standards are tried, and the path they are aligned to is discarded if anything fails (this method works with Super Smash Bros. Melee, Kirby Air-Ride, and recently Wii Channel TV, with more support to come).
Code: ( Python 2.7 / UMC-script v3.0 )
Full code: not available yet.
Improved method using a decorator class (less work to define structures):
Step 1: Create a dictionary which holds the info for the known structures.
globals()['structs'] = { # struct_name: [ expected_size, struct_function (for root structs) or _pass, { # pointer_addr: sub-struct_name, # ... # }, isArray=False ], # ... }
Step 2: Define the decorator class which does half our work for us (registering the function name and function or _pass with the given info on init).
class structPointers(object ): """registers structure definitions for the pathfinder arguments: - size (int): the size of the struct in bytes (-1 for undefined) - pointers (dict): location of pointer-value : expected struct name - isarray (bool): notifies the path-finder to multiply "size" to match the overflow before testing for false padding - root (bool): marks this struct as a root struct usage: @structPointers( 8, { 0:'_pass', ... } ) def struct2( offset ): ... @structPointers( 32, { 0:'root_struct1', 4:'struct2', ... }, root=True ) def root_struct1( offset ): ... """ # NOTE: see structs below for more examples. def __init__( this, size, pointers={}, isArray=False, root=False ): this.size=size; this.pointers = pointers; this.isarray = isArray; this.root = root def __call__( this, struct_function ): globals()['structs'][struct_function.__name__] = [ this.size, struct_function if this.root else _pass , this.pointers, this.isarray ] return struct_function
Step 3: Define the structures like so (associate size, names with pointer locations, isArray, root, and set our function): Note: isArray is used to multiply the expected size along the given size and test for padding.
@structPointers( 64, { 0:'_pass', # string 8:'_bone', 12:'_bone', 16:'_object', 56:'_pass' # (IB matrix) }, root=True) def _bone(Bone_Offset, parent=None, prev=None, rig ='joint_obj'): # file operation goes here
Step 4: Gather and sort the pointers to each structure using the pointer table and root structure pointers.
relocation_array = array(bu32,count=pointer_cnt); relocation_array.__color__ = 0x40FF40 relocations = relocation_array(offset=pointer_tbl,label={' -- relocations':' -- pointer-address'}) pointers = {pointer_tbl} # using a set to efficiently remove duplicate entries for addr in relocations: jump(addr+32) p = bu32( label=' -- Pointer' ); p.__color__ = 0xBBFFBB pointers.add(p+32) for i in range(root_cnt): jump(pointer_tbl+(pointer_cnt*4)+(i*8)) p = bu32( label=' -- Root Pointer' ); p.__color__ = 0xFFDDBB pointers.add(p+32)
Step 5: Find a valid path
def test_path(struct_name, given_size, struct_offset): """walker to validate the path recursively""" expected_size, func, ptrs, isArray = structs[struct_name] if expected_size == -1: return True # allow ignorance if given_size != expected_size: if given_size > expected_size: padSize = given_size%expected_size if isArray else given_size-expected_size if sum(array(bu8,count=padSize,offset=struct_offset+(given_size-padSize))( label=' -- pad-byte validity')): print(' invalid path: given size %i > expected size %i for struct %s'%(given_size,expected_size,struct_name)); return False # this error shouldn't occur, but just in case, it's better to catch it anyway... else: print(' invalid path: given size %i < expected size %i for struct %s'%(given_size,expected_size,struct_name)); return False pid = 0 jump(struct_offset, label=' -- validating struct %s'%struct_name) for i in range(expected_size): # !important: test each byte of the structure if i in ptrs: name = ptrs[i] pointer=struct_offset+i jump(pointer) location = bu32(label=' -- pointer %i of struct %s'%(pid,struct_name))+32 pid += 1 if pointer-32 not in relocations: if location==32: continue # 0-pointer else: print(' invalid path: 0-pointer expected, but found data at location %i for struct %s.'%(i,struct_name)); return False if location in pointers: size = min([address for address in pointers if address>location])-location else: print(" invalid path: couldn't determine size of sub-struct %s at location %i for struct %s."%(name,i,struct_name)); return False if not test_path(name, size, location): return False elif (struct_offset+i)-32 in relocations: print(' invalid path: pointer found in expected data-space at location %i for struct %s.'%(i,struct_name)); return False return True # TODO: test 2nd-priority root structures of non-fixed size, such as raw image data. # TODO: struct priority (2 structs have the same size, which do we use first?) # TODO: gather all root structs, and determine the order from what is given (matanim should be parsed before joint, but yet is supplied afterwards). # wisdom from Tcll: you can't trust anything for what it is in this format. for i in range(root_cnt): jump(pointer_tbl+(pointer_cnt*4)+(i*8), label=' -- Root Structs') # get to the root node root_offset = bu32(label=' -- Data Offset')+32 string_offset = bu32(label=' -- string Offset') # could be a dictionary key: { str(key): value } root_size = min([address for address in pointers if address>root_offset])-root_offset # a root pointer should never be the last pointer. # ^ note: this is the given size, which could be larger than the expected size. # resolve the given root size into a dictionary of possible root structures (by size) roots = {root_size:[]} # { 48; ['root5', ... ], 44: ['root8', ... ] } (valid sizes determined from if sum(padding)==0) # ^ initial root size included to prevent pad-scanning issues. for structname, (structsize, structfunc, structptrs, isArray) in structs.items(): if structfunc!=_pass: # root structures always have their own function if 0<structsize<=root_size: for _structsize in roots: if structsize==_structsize: roots[structsize].append(structname) else: jump(root_offset+structsize, label=' -- validating root struct oversize') pad_bytes = array(bu8, count=root_size-structsize)( label=' -- initial pad-byte validity' ) if sum(pad_bytes)==0: roots[structsize] = [structname] found = False rl = len(roots) print('\nfound %i possible size categor%s for root %i of size %i:'%(rl, 'y' if rl == 1 else 'ies', i, root_size)) for _size, root_names in roots.items(): rs = len(root_names) if root_names: print(' scanning %i possible root struct%s of size %i.'%(rs, if rs == 1 else 's', _size)) for ni,root_name in enumerate(root_names,1): print(' %i: %s'%(ni, root_name)) if test_path(root_name, _size, root_offset): print(' found a valid path, attempting to parse...') # noinspection PyBroadException try: structs[root_name][1](root_offset); print(' parsing succeeded'); found=True; break except: import sys,traceback # TODO: remove print(' parsing failed, reason:\n'); traceback.print_exception( *sys.exc_info() ); print() if found: break else: print(' could not find a valid path.') if not found: print("could not find anything out of what's currently known.") # Tcll - possible test: if index has something to do with selection
Extra documentation:
# Tcll - What does this do? # Well to put it into perspective (if it looks like a duck and quacks like a duck, then it must be a duck), # all we are given is a list of pointer addresses (relocations), and a root pointer (aside from a useless string address). # We don't know what this mysterious root pointer points to, so we have to guess, # but first we need to collect some information to give us an accurate guess... # So we resolve the pointer addresses into a list of struct addresses (or pointers to structs), # and add any root pointers to that list (saving the pointer addresses for testing if an expected pointer exists). # Now we can make a good guess as to the size of our root struct, as well as the sizes of its children. # For some added information, what we have done with the decorator-class above (structPointers), # is build a collection of pre-defined structs with their sizes, (relative) pointer locations, and associate functions. # Now what we do with our given and expected information is put them together to determine our result. # First we start with the root size and gather structures matching or less than the given size. # Once we have a struct, if the size is smaller than the given size, we test that the extended data is pad-bytes (0s). # If valid, we now test that the expected pointer locations are valid (also searching for invalid pointers). # (this is why we saved our pointer addresses mentioned above) # Finally, if those are valid, we test that the structs at those pointers are valid, # following the same recursive routine until we hit a 'pass'. # (which marks either data, something unknown, or a struct with an undetermined size) # NOTE: ignorant tests with struct sizes of -1 are in place for unknown data and 'pass'. # NOTE: testing arrays of structures is done by modulo-ing (%) the given struct size by the associated expected size, # and using the remainder to test for pad-bytes (yes, modulo is the remainder of divide (/)). # If that is valid, test the first structure of the array. # If all tests are positive, we can call our associated root-struct function. # NOTE: (is it really a duck?) just because all tests are positive does not mean we have a valid structure-path. # Note that the testing done is only based on what is known, so certain things can easily look like, # and be mistaken for other things without enough collected information (like the current matanim struct for example). # Unfortunately there is not much of a discrete way to validate the data in structs without doing something overly complex, # meaning testing would take hours due to the verbosity of the testing depth. # So it is left up to an exception to catch the invalid data and attempt to try another root struct. # This can be a double-edged sword though as the path may be correct, # while the invalid data is just something left to be figured out, causing the path-finder to choke. # (at this point, any invalid data added to UGE's backend is not removed, the data is simply dealt with until finished)
Structure Layout
Currently, only a little is known about the full structure layout, but here is what is known so far.
Mesh layout: (SSBM Pl*.dat, Ty*.dat)
Root
└ Bone
├ Bone
└ Object
├ Material
│ ├ Colors
│ └ Texture
│ ├ Image
│ │ └ (Image Data)
│ ├ Pallet
│ │ └ (Pallet Data)
│ ├ Unknown1
│ └ Texture
├ Mesh
│ ├ Attributes
│ │ └ (Vector Data)
│ ├ Influence Matrix Array
│ │ └ Weight Array
│ │ └ Weight
│ └ Display List
│ └ (Sub-Vector Data (if any) and/or Indexes)
└ Object
Structure Definitions
(As found in Super Smash Bros. Melee)
Bone Structures (Root Structure)
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | String Offset (typically unused) |
0x04 | 4 | unsigned | Unknown Flags. |
0x08 | 4 | pointer | Child Bone Struct Offset |
0x0C | 4 | pointer | Next Bone Struct Offset |
0x10 | 4 | pointer | Object Struct Offset |
0x14 | 4 | float | Rotation X |
0x18 | 4 | float | Rotation Y |
0x1C | 4 | float | Rotation Z |
0x20 | 4 | float | Scale X |
0x24 | 4 | float | Scale Y |
0x28 | 4 | float | Scale Z |
0x2C | 4 | float | Location X |
0x30 | 4 | float | Location Y |
0x34 | 4 | float | Location Z |
0x38 | 4 | pointer | Inverse Bind Matrix Offset (use parent matrix if 0) |
0x3C | 4 | pointer | Unknown. |
Inverse Matrix (3x4)
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | float | Rotation 1 1 |
0x04 | 4 | float | Rotation 1 2 |
0x08 | 4 | float | Rotation 1 3 |
0x0C | 4 | float | Translation X |
0x10 | 4 | float | Rotation 2 1 |
0x14 | 4 | float | Rotation 2 2 |
0x18 | 4 | float | Rotation 2 3 |
0x1C | 4 | float | Translation Y |
0x20 | 4 | float | Rotation 3 1 |
0x24 | 4 | float | Rotation 3 2 |
0x28 | 4 | float | Rotation 3 3 |
0x2C | 4 | float | Translation Z |
Object Structures
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | String Offset (typically unused) |
0x04 | 4 | pointer | Next Object Struct Offset |
0x08 | 4 | pointer | Material Struct Offset |
0x0C | 4 | pointer | Mesh Struct Offset |
Material Structures
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | String Offset (typically unused) |
0x04 | 4 | unsigned | Unknown Render Mode Flags |
0x08 | 4 | pointer | Texture Struct Offset |
0x0C | 4 | pointer | Colors Struct Offset |
0x10 | 4 | pointer | Unknown Render Struct Offset |
0x14 | 4 | pointer | Pixel Processing Struct Offset |
Texture Structures
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | String Offset (typically unused) |
0x04 | 4 | pointer | Next Texture Struct Offset |
0x08 | 4 | unsigned | GXTexMapID |
0x0C | 4 | unsigned | GXTexGenSrc |
0x10 | 4 | float | Rotation X |
0x14 | 4 | float | Rotation Y |
0x18 | 4 | float | Rotation Z |
0x1C | 4 | float | Scale X |
0x20 | 4 | float | Scale Y |
0x24 | 4 | float | Scale Z |
0x28 | 4 | float | Translation X |
0x2C | 4 | float | Translation Y |
0x30 | 4 | float | Translation Z |
0x34 | 4 | unsigned | Wrap S |
0x38 | 4 | unsigned | Wrap T |
0x3C | 1 | unsigned | Repeat S |
0x3D | 1 | unsigned | Repeat T |
0x3E | 2 | unsigned | Padding |
0x40 | 4 | unsigned | Unknown Flags. |
0x44 | 4 | float | Blending |
0x48 | 4 | unsigned | Mag Filter (GXTexFilter) |
0x4C | 4 | pointer | Image Struct Offset |
0x50 | 4 | pointer | Pallet Struct Offset |
0x54 | 4 | pointer | LOD Struct Offset |
0x58 | 4 | pointer | TEV Struct Offset |
Image Structures
Offset | Size | Format | Description | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x00 | 4 | pointer | Data Offset | ||||||||||||||||||||||||||||||||||||
0x04 | 2 | unsigned | Width | ||||||||||||||||||||||||||||||||||||
0x06 | 2 | unsigned | Height | ||||||||||||||||||||||||||||||||||||
0x08 | 4 | unsigned | Format
| ||||||||||||||||||||||||||||||||||||
0x0C | 4 | unsigned | Mipmap (GXBool) | ||||||||||||||||||||||||||||||||||||
0x10 | 4 | float | Min LOD | ||||||||||||||||||||||||||||||||||||
0x14 | 4 | float | Max LOD |
Palette Structures
Offset | Size | Format | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x00 | 4 | pointer | Data Offset (typically 0) | ||||||||||||
0x04 | 4 | unsigned | Format (GXTlutFmt)
| ||||||||||||
0x08 | 4 | unsigned | Name (GXTlut) | ||||||||||||
0x0C | 2 | unsigned | Color Count | ||||||||||||
0x0E | 2 | unsigned | Padding |
LOD Structure
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | unsigned | Min Filter (GXTexFilter) |
0x04 | 4 | float | LOD Bias |
0x08 | 1 | unsigned | Bias Clamp (GXBool) |
0x09 | 1 | unsigned | Edge LOD Enable (GXBool) |
0x0A | 2 | unsigned | Padding |
0x0C | 4 | unsigned | Max Anisotropy (GXAnisotropy) |
Unknown_1 Structures
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | Unknown Offset? Typically 0. |
0x04 | 4 | unsigned | Unknown. Flags? |
0x08 | 4 | unsigned | Unknown. Flags? |
0x0C | 1 | unsigned | Unknown. |
0x0D | 1 | unsigned | Unknown. |
0x0E | 1 | unsigned | Unknown. |
0x0F | 1 | unsigned | Unknown. |
0x10 | 16 | padding? |
Color Structures
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 1*4 | unsigned | RGBA Diffuse |
0x04 | 1*4 | unsigned | RGBA Ambient |
0x08 | 1*4 | unsigned | RGBA Specular |
0x0C | 4 | float | Transparency (1.0 = opaque) |
0x10 | 4 | float | Shininess |
Pixel Processing Structures
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 1 | unsigned | Flags |
0x01 | 1 | unsigned | Alpha Ref0 |
0x02 | 1 | unsigned | Alpha Ref1 |
0x03 | 1 | unsigned | Destination Alpha |
0x04 | 1 | unsigned | Type (GXBlendMode) |
0x05 | 1 | unsigned | Source Factor (GXBlendFactor) |
0x06 | 1 | unsigned | Destination Factor (GXBlendFactor) |
0x07 | 1 | unsigned | Blend Op (GXLogicOp) |
0x08 | 1 | unsigned | Depth Function (GXCompare) |
0x09 | 1 | unsigned | Alpha Comp0 (GXCompare) |
0x0A | 1 | unsigned | Alpha Op (GXAlphaOp) |
0x0B | 1 | unsigned | Alpha Comp1 (GXCompare) |
Mesh Structures
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | String Offset (typically unused) |
0x04 | 4 | pointer | Next Mesh Struct Offset |
0x08 | 4 | pointer | Mesh Attributes Struct Array Offset
(parse until a CP_ID of 0xFF) |
0x0C | 2 | unsigned | Unknown Flags. |
0x0E | 2 | unsigned | Display List size *32 |
0x10 | 4 | pointer | Display List Data Offset |
0x14 | 4 | pointer | Influence Matrix Array Offset
(parse the array until 0x00000000) |
Mesh Attribute Structures
Data: (HexEdit-styled)
00 00 00 09 00 00 00 03 00 00 00 01 00 00 00 03
0B 00 00 06 00 00 00 00
Offset | Size | Format | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x00 | 4 | unsigned | CP_ID
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0x04 | 4 | unsigned | CP_Type
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0x08 | 4 | unsigned | Component Count
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0x0C | 4 | unsigned | Data Type/Format
Colors:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0x10 | 1 | unsigned | Divisor (Floating Point Exponent for int data types) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0x11 | 1 | unsigned | Unknown. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0x12 | 2 | unsigned | Stride | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0x14 | 4 | pointer | Data Offset |
Weight Structure Arrays
Data: (HexEdit-styled)
00 00 F3 70 3F 80 00 00 00 00 00 00 00 00 00 00
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | Bone Struct Offset |
0x04 | 4 | float | Weight |
The Bone Struct Offset is used to dereference the already existing bone struct to get it's inverse-bind matrix.
More structures to be documented. (As found in Kirby Air-Ride)
Unknown_2 Structures (Root Structure)
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | Unknown Offset. |
0x04 | 4 | pointer | Unknown_3 Struct Offset |
0x08 | 4 | pointer | Unknown_4 Struct Offset |
0x0C | 4 | pointer | Unknown. Matrix 3x4 Offset. |
0x10 | 4 | pointer | Unknown_5 Struct Offset. |
0x14 | 4 | pointer | Unknown Offset. |
0x18 | 4 | pointer | Unknown_6 Struct Offset. |
0x1C | 20 | Unknown. Padding? |
Unknown_3 Structures
Offset | Size | Format | Description |
---|---|---|---|
0x00 | 4 | pointer | Unknown Single Bone Struct Offset. |
0x04 | 4 | unsigned | Unknown. |
0x08 | 4 | unsigned | Unknown. |
0x0C | 4 | unsigned | Unknown. |
0x10 | 4 | pointer | Unknown_7 Struct Offset (attributes?). |
0x14 | 4 | pointer | Unknown_7 Struct Offset (attributes?). |
0x18 | 4 | pointer | Unknown_7 Struct Offset (attributes?). |
0x1C | 4 | pointer | Unknown_7 Struct Offset (attributes?). |
0x20 | 4 | pointer | Unknown_7 Struct Offset (attributes?). |
0x24 | 4 | pointer | Unknown_7 Struct Offset (attributes?). |
0x28 | 4 | pointer | Bone Struct Offset |
More structures to be documented.
Animation
The animation data still has yet to be majorly looked into, but in Melee, can be found in Pl**Aj.dat files. These files alone are archives containing DAT file data, in which those "files" contain the animation data.
Resources
HexEdit 5.0 - an advanced hex editor with template support.
HAL_DAT template for HexEdit 5.0
Current: broken link removed, will be restored in time.
WARNING: the pointer count must be under 348 (0x180) or HexEdit will freeze.
For reference, this older template works on larger files but breaks on most.
Old: broken link removed, will be restored in time.
References
(old) http://smashboards.com/threads/melee-dat-format.292603/