Difference between revisions of "PAK"

From Mania Tech Wiki
Jump to: navigation, search
(Header versions 9+)
m (Header versions 9+: human-readable sizes)
Line 106: Line 106:
 
   uint32  DecryptFlags;
 
   uint32  DecryptFlags;
 
   if (version >= 15)
 
   if (version >= 15)
     uint32  HeaderMaxSize;  // 0x4000 = small, 0x100000 = big, 0x1000000 = huge
+
     uint32  HeaderMaxSize;  // 0x4000 = small (16 KB), 0x100000 = big (1 MB), 0x1000000 = huge (16 MB)
 
   if (version >= 9)
 
   if (version >= 9)
 
   {
 
   {

Revision as of 21:14, 15 June 2017

TrackMania .pak files are archives that contain a collection of other files, much like .zip archives. They are found in the "Packs" folder in the game installation directory. In ManiaPlanet, there are also .Pack.Gbx files with the same purpose.

In older TM versions, .pak files are both compressed and encrypted (version 3); even the file index containing the file names and directory structure is encrypted. In ManiaPlanet, they include an uncompressed/uncrypted section as well (versions 6+).

Encryption

.pak files are encrypted using Blowfish in CBC mode with a 16-byte key. When an encrypted block begins, decryption is initialized by reading an 8-byte, plaintext IV (initialization vector) from the file. From then on, Blowfish decryption commences.

Each .pak file has its own encryption key. The keys for the different packs are found in packlist.dat.

There is one gotcha where Nadeo deviates from regular CBC: on the first read and after every 256 bytes read, the current IV is xor'd with a value we'll call ivXor (this happens before the IV is applied to the Blowfish-decrypted block). ivXor is initialized to zero and is also reset to zero every time it is applied, so most times it doesn't have an effect. Crucially though it does get assigned a non-zero value while reading the .pak header, of which the effect usually kicks in somewhere in the middle of the file list. So if you don't take this into account, half your file table will be messed up (which is likely what Nadeo was intending with this trick). The same goes for .gbx files embedded in packs.

File structure

A description of the used data types can be found on the GBX page.

Header version 3

  • byte magic[8]: "NadeoPak"
  • uint32 version (3)
  • uint64 headerIV
  • Blowfish encrypted:
    • uint128 headerMD5
    • uint32 metaDataOffset
    • uint32 dataOffset
    • if version >= 2
      • uint32 metaDataUncompressedSize
      • uint32 metaDataCompressedSize
    • if version >= 3
      • uint128
    • uint32 flags
    • uint32 numFolders
    • FolderEntry folders[numFolders]
      • uint32 parentFolderIndex (index into folders; -1 if this is a root folder)
      • string name
    • Set up ivXor
    • uint32 numFiles
    • FileEntry files[numFiles]
      • uint32 folderIndex (index into folders)
      • string name
      • uint32
      • uint32 uncompressedSize
      • uint32 compressedSize
      • uint32 offset
      • uint32 classID (indicates the type of the file)
      • uint64 flags

Header MD5

The header contains a checksum hash of itself for integrity checking purposes. To calculate it, set the md5 field in the header to all zeros, and calculate the MD5 hash of the entire encrypted part of the header.

It should be noted that the file data is not hashed, nor cryptographically signed.

ivXor setup

ivXor is initially zero, but if requested, its value can be changed through the following function:

void CalcIVXor(byte* pInput, int count)
{
    for (int i = 0; i < count; i++)
    {
        uint lopart = _ivXor & 0xFFFFFFFF;
        uint hipart = _ivXor >> 32;
        lopart = (pInput[i] | 0xAA) ^ ((lopart << 13) | (hipart >> 19));
        hipart = (_ivXor << 13) >> 32;
        _ivXor = (hipart << 32) | lopart;
    }
}

This function gets called in two cases: when reading the .pak header, and when reading the contents of a .gbx file embedded in the pak.

For the header, the ivXor setup only happens if there are three or more folders, and the name of the third folder is 4 or more characters in length. The folder's name is converted to UTF16 and CalcIVXor(&wszName[2], 4) is called (i.e. it runs on the third and fourth character of the name, both characters being two bytes). If the one of the folder conditions is not met, ivXor stays zero.

For .gbx files, CalcIVXor(&specificParentClassID, 4) is called every time ReadNode is called. First, the parent class of the current node's class is determined. Then, the specific class ID corresponding to this is determined (which is the exact opposite process of finding the generalized class ID as described in Class IDs). This ID, stored as little endian bytes, is then used as input for CalcIVXor.

There is one special exception with .gbx files. If the specific class ID is 0x07031000 (Control::CControlText), 0x07001000 (Control::CControlBase) is used as input instead.

Header versions 6-8

byte magic[8]: "NadeoPak"
uint32 version (6, 7, 8)
if (version >= 6)
{
  uint256    ContentsChecksum;  // Checksum Sha256 of the pack contents starting at next byte
  uint32     DecryptFlags;
  if (version >= 7)
  {
    struct SAuthorInfo
    {
      uint32 version;
      string Login;
      string Nick;
      string Zone;
      string ExtraInfo;
    }
    string   Comment;
    uint128  unused;
    if (version >= 8)
    {
      string CreationBuildInfo;
      string AuthorUrl;
    }
  }
}

Header versions 9+

byte magic[8]: "NadeoPak"
uint32 version (9 or higher)
if (version >= 6)
{
  uint256 ContentsChecksum;  // Checksum Sha256 of the pack contents starting at next byte
  uint32  DecryptFlags;
  if (version >= 15)
    uint32  HeaderMaxSize;  // 0x4000 = small (16 KB), 0x100000 = big (1 MB), 0x1000000 = huge (16 MB)
  if (version >= 9)
  {
    struct SAuthorInfo
    {
      uint32 version;
      string Login;
      string Nick;
      string Zone;
      string ExtraInfo;
    }
    string      ManialinkUrl;
    if (version >= 13)
      string    DownloadUrl;
    uint64      CreationDate;
    string      Comment;
    if (version >= 12)
    {
      string    Xml;
      string    TitleID;
    }
    string      UsageSubDir;  // to known the kind of pack it is
    string      CreationBuildInfo;
    uint128     unused;
    if (version >= 10)
    {
      uint32  NbIncludedPacks;
      struct SIncludedPacksHeaders
      {
        uint256     ContentsChecksum;  // Sha256
        string      Name;
        SAuthorInfo AuthorInfo;
        string      InfoManialinkUrl;
        uint64      CreationDate;
        string      Name;
        if (version >= 11)
          uint32    IncludeDepth;
       } IncludedPacks[];
    }
  }
}

Data

The content of each file starts at Header.dataOffset + FileEntry.offset in the .pak file. First, an 8-byte plaintext IV is read. Then, FileEntry.compressedSize bytes are read and decrypted using Blowfish in CBC mode, using the same key that was used to decrypt the header. If FileEntry.flags & 0x7C is not zero, the file is compressed and should be decompressed using zlib deflate after decryption (it will end up at FileEntry.uncompressedSize bytes).

The type of the file can be found from the extension in the name, or, if this is not available (many file names are actually just hashes), from the class ID.

GBX compression intricacies

There are some not-so-obvious details about compressed .gbx files, which may not seem important at first sight but actually make a world of difference when extracting them. Not addressing these will result in corrupt data. (Note: this is about the deflate compression specific to .pak files, not the LZO compression specific to .gbx data sections. .gbx files in a .pak don't have LZO-compressed data sections (BUUR header)).

  • Files from a .pak are not decrypted and then decompressed in their entirety before they are parsed. Instead, the parser requests data from the decompressor as it goes along, which in turn requests data from the decrypter. Whenever the decompressor's buffer is empty and new data is requested, it requests blocks of 0x100 bytes from the decrypter and decompresses each block into its buffer of 0x400 bytes, until the buffer is full. Then, whenever the parser requests more data, it is simply copied over from the buffer – until the buffer is empty again.
  • You must use the zlib library. Don't use a different zlib-compatible implementation, and also don't skip over the zlib header and decompress the data with a deflate implementation. TrackMania very much depends on zlib's behaviour: how soon it starts returning decompressed data after compressed data has been put into it, and how much decompressed data it returns on every iteration.

The reason for these is the ivXor trick. Its value gets updated every time a new node is read, but what is crucial is when its new value will go into effect. The decompressor reads 0x400 uncompressed bytes ahead every time, using one ivXor value, and keeps its data in a buffer. All newly calculated ivXor values after this will not come into effect until this buffer is empty again. So of course you need to read ahead the correct number of uncompressed bytes, but the number of compressed bytes you have read to get to that point also has to be correct (which is implementation specific).

Metadata

The metadata section comes at the very end of the .pak file, after all the file contents. It attaches dataless .gbx files to specific files in the archive ("dataless" meaning that the files contain everything up to and including the numNodes field as described on the GBX page, but nothing further). The section begins in the .pak at offset Header.metaDataOffset and is zlib-compressed and then encrypted in its entirety. For encryption, the same algorithm and key are used as with the header and data.

The format is very simple:

  • uint64 IV
  • compressed + encrypted:
    • 0 or more times:
      • uint32 fileIndex (index into Header.files indicating which file the following metadata belongs to)
      • byte[] metaGBX
    • int32 terminatingIndex = -1

The length of each metaGBX is not stored explicitly in this table. You have to parse the gbx's header to find this out.

One important thing to note is that the size of the compressed metadata must be a multiple of 0x100 bytes (this is excluding the 8 IV bytes that are prepended during encryption). If it isn't, TrackMania will not read the last part of the metadata which is not rounded up, and instead try to parse garbage bytes – which will most likely result in a crash.

Tools

  • TMPakTool (@archive.org) - an open source tool which can open and edit .pak files in an Explorer-like interface. Comes with a C# library which you can use in your own applications to work with .pak files. Download & Source code
  • GbxDump - a Windows tool to dump and analyze the headers of .Challenge|Map.Gbx, .Replay.Gbx, .Pack.Gbx|.pak, .ObjectInfo.Gbx and .Item.Gbx files.
  • GBX Data Fetcher - a PHP module with classes to extract useful data from .Challenge|Map.Gbx files (including the thumbnail image), .Replay.Gbx files and .Pack.Gbx|.pak files, and parse their XML blocks.
  • Extract GBX data - a PHP script to format and print data from all .Gbx file types supported by the GBX Data Fetcher module.