Difference between revisions of "PAK"

Revision as of 17:53, 30 March 2019

TrackMania .pak files are archives that contain a collection of other files, much like .zip archives. They are found in the "Packs" folder in the game installation directory. In ManiaPlanet, there are also .Pack.Gbx files with the same purpose.

In older TM versions, .pak files are both compressed and encrypted (version 3); even the file index containing the file names and directory structure is encrypted. In ManiaPlanet, they include an uncompressed/uncrypted section as well (versions 6+).

Encryption

.pak files are encrypted using Blowfish in CBC mode with a 16-byte key. When an encrypted block begins, decryption is initialized by reading an 8-byte, plaintext IV (initialization vector) from the file. From then on, Blowfish decryption commences.

Each .pak file has its own encryption key. The keys for the different packs are found in packlist.dat.

There is one gotcha where Nadeo deviates from regular CBC: on the first read and after every 256 bytes read, the current IV is xor'd with a value we'll call ivXor (this happens before the IV is applied to the Blowfish-decrypted block). ivXor is initialized to zero and is also reset to zero every time it is applied, so most times it doesn't have an effect. Crucially though it does get assigned a non-zero value while reading the .pak header, of which the effect usually kicks in somewhere in the middle of the file list. So if you don't take this into account, half your file table will be messed up (which is likely what Nadeo was intending with this trick). The same goes for .gbx files embedded in packs.

File structure

A description of the used data types can be found on the GBX page.

Header version 3

byte magic[8];  // "NadeoPak"
uint32 version; // 3
uint64 headerIV;
Blowfish encrypted:
{
    uint128 headerMD5;
    uint32 gbxHeadersStart; // offset to metadata section
    uint32 dataStart;
    if version >= 2:
    {
        uint32 gbxHeadersSize;
        uint32 gbxHeadersComprSize;
    }
    if version >= 3:
        uint128 unused;
    uint32 flags;
    uint32 numFolders;
    FolderDesc folders[numFolders]
    {
        uint32 parentFolderIndex; // index into folders; -1 if this is a root folder
        string name;
    }
    Set up ivXor;
    uint32 numFiles;
    FileDesc files[numFiles]
    {
        uint32 folderIndex; // index into folders
        string name;
        uint32 unknown;
        uint32 uncompressedSize;
        uint32 compressedSize;
        uint32 offset;
        uint32 classID; // indicates the type of the file
        uint64 flags;
    }
}

Header MD5

The header contains a checksum hash of itself for integrity checking purposes. To calculate it, set the md5 field in the header to all zeros, and calculate the MD5 hash of the entire encrypted part of the header.

It should be noted that the file data is not hashed, nor cryptographically signed.

ivXor setup

ivXor is initially zero, but if requested, its value can be changed through the following function:

void CalcIVXor(byte* pInput, int count)
{
    for (int i = 0; i < count; i++)
    {
        uint lopart = _ivXor & 0xFFFFFFFF;
        uint hipart = _ivXor >> 32;
        lopart = (pInput[i] | 0xAA) ^ ((lopart << 13) | (hipart >> 19));
        hipart = (_ivXor << 13) >> 32;
        _ivXor = (hipart << 32) | lopart;
    }
}

This function gets called in two cases: when reading the .pak header, and when reading the contents of a .gbx file embedded in the pak.

For the header, the ivXor setup only happens if there are three or more folders, and the name of the third folder is 4 or more characters in length. The folder's name is converted to UTF16 and CalcIVXor(&wszName[2], 4) is called (i.e. it runs on the third and fourth character of the name, both characters being two bytes). If the one of the folder conditions is not met, ivXor stays zero.

For .gbx files, CalcIVXor(&specificParentClassID, 4) is called every time ReadNode is called. First, the parent class of the current node's class is determined. Then, the specific class ID corresponding to this is determined (which is the exact opposite process of finding the generalized class ID as described in Class IDs). This ID, stored as little endian bytes, is then used as input for CalcIVXor.

There is one special exception with .gbx files. If the specific class ID is 0x07031000 (Control::CControlText), 0x07001000 (Control::CControlBase) is used as input instead.

Header versions 6+

byte magic[8];  // "NadeoPak"
uint32 version;
if (version >= 6)
{
    uint256 ContentsChecksum;    // Checksum Sha256 of the pack contents starting at next byte
    struct SHeaderFlagsUncrypt
    {
        uint32 IsHeaderPrivate     : 1;
        uint32 UseDefaultHeaderKey : 1;
        uint32 IsDataPrivate       : 1;
        uint32 IsImpostor          : 1;
        uint32 __Unused__          : 28;
    };
    if (version >= 15)
        uint32 HeaderMaxSize;    // 0x4000 = Small (16 KB), 0x100000 = Big (1 MB), 0x1000000 = Huge (16 MB)
    if (version >= 7)
    {
        struct SAuthorInfo
        {
            uint32 Version;
            string Login;
            string Nick;
            string Zone;
            string ExtraInfo;
        };
        if (version < 9)
        {
            string  Comment;
            uint128 unused;
        }
        if (version == 8)
        {
            string CreationBuildInfo;
            string AuthorUrl;
        }
        if (version >= 9)
        {
            string ManialinkUrl;
            if (version >= 13)
                string DownloadUrl;
            uint64 CreationDate;  // Win32 FILETIME structure
            string Comment;
            if (version >= 12)
            {
                string Xml;
                string TitleID;
            }
            string UsageSubDir;  // to known the kind of pack it is
            string CreationBuildInfo;
            uint128 unused;
            if (version >= 10)
            {
                uint32 NbIncludedPacks;
                struct SIncludedPacksHeaders
                {
                    uint256     ContentsChecksum; // Sha256
                    string      Name;
                    SAuthorInfo AuthorInfo;
                    string      InfoManialinkUrl;
                    uint64      CreationDate;
                    string      Name;
                    if (version >= 11)
                        uint32 IncludeDepth;
                } IncludedPacks[];
            }
        }
    }
    Blowfish encrypted:  // Unencrypted, if (DecryptFlags & 0x3) == 0
    {
        uint128 Checksum;
        uint32 GbxHeadersStart;  // Offset to the metadata section
        if version < 15:
            uint32 DataStart;     // If version >= 15: DataStart = HeaderMaxSize
        if version >= 2:
        {
            uint32 GbxHeadersSize;
            uint32 GbxHeadersComprSize;
        }
        if version >= 14:
            uint128 unused;
        if version >= 16:
            uint32 FileSize;
        if version >= 3:
            uint128 unused;
        if version == 6:
            SAuthorInfo;
        uint32 Flags;
        uint32 NumFolders;
        FolderDesc Folders[NumFolders]
        {
            int32 FolderIndexParent;
            string FolderName;
        }
        uint32 NumFiles;
        FileDesc Files[NumFiles]
        {
            int32 FolderIndex;
            string FileName;
            uint32 unknown;
            uint32 UncompressedSize;
            uint32 CompressedSize;
            uint32 Offset;
            uint32 classID;
            if version >= 17:
                uint32 Size;
            if version >= 14:
                uint128 Checksum;
            struct SFileDescFlags
            {
                uint32 IsHashed          : 1;
                uint32 PublishFid        : 1;
                uint32 Compression       : 4;
                uint32 IsSeekable        : 1;
                uint32 _Unknown_         : 1;
                uint32 __Unused1__       : 24;
                uint32 DontUseDummyWrite : 1;
                uint32 OpaqueUserData    : 16;
                uint32 PublicFile        : 1;
                uint32 ForceNoCrypt      : 1;
                uint32 __Unused2__       : 13;
            };
        }
    }
}

Data

The content of each file starts at Header.dataStart + FileDesc.offset in the .pak file. From version 15, the data block starts at HeaderMaxSize. First, an 8-byte plaintext IV is read. Then, FileDesc.compressedSize bytes are read and decrypted using Blowfish in CBC mode, using the same key that was used to decrypt the header. If FileDesc.flags & 0x7C is not zero, the file is compressed and should be decompressed using zlib deflate after decryption (it will end up at FileDesc.uncompressedSize bytes).

The type of the file can be found from the extension in the name, or, if this is not available (many file names are actually just hashes), from the class ID.

GBX compression intricacies

There are some not-so-obvious details about compressed .gbx files, which may not seem important at first sight but actually make a world of difference when extracting them. Not addressing these will result in corrupt data. (Note: this is about the deflate compression specific to .pak files, not the LZO compression specific to .gbx data sections. .gbx files in a .pak don't have LZO-compressed data sections (BUUR header)).

Files from a .pak are not decrypted and then decompressed in their entirety before they are parsed. Instead, the parser requests data from the decompressor as it goes along, which in turn requests data from the decrypter. Whenever the decompressor's buffer is empty and new data is requested, it requests blocks of 0x100 bytes from the decrypter and decompresses each block into its buffer of 0x400 bytes, until the buffer is full. Then, whenever the parser requests more data, it is simply copied over from the buffer – until the buffer is empty again.
You must use the zlib library. Don't use a different zlib-compatible implementation, and also don't skip over the zlib header and decompress the data with a deflate implementation. TrackMania very much depends on zlib's behaviour: how soon it starts returning decompressed data after compressed data has been put into it, and how much decompressed data it returns on every iteration.

The reason for these is the ivXor trick. Its value gets updated every time a new node is read, but what is crucial is when its new value will go into effect. The decompressor reads 0x400 uncompressed bytes ahead every time, using one ivXor value, and keeps its data in a buffer. All newly calculated ivXor values after this will not come into effect until this buffer is empty again. So of course you need to read ahead the correct number of uncompressed bytes, but the number of compressed bytes you have read to get to that point also has to be correct (which is implementation specific).

Metadata

The metadata section comes at the very end of the .pak file, after all the file contents. It attaches dataless .gbx files to specific files in the archive ("dataless" meaning that the files contain everything up to and including the numNodes field as described on the GBX page, but nothing further). The section begins in the .pak at offset Header.metaDataOffset and is zlib-compressed and then encrypted in its entirety. For encryption, the same algorithm and key are used as with the header and data.

The format is very simple:

uint64 IV
compressed + encrypted:
- 0 or more times:
  - uint32 fileIndex (index into Header.files indicating which file the following metadata belongs to)
  - byte[] metaGBX
- int32 terminatingIndex = -1

The length of each metaGBX is not stored explicitly in this table. You have to parse the gbx's header to find this out.

One important thing to note is that the size of the compressed metadata must be a multiple of 0x100 bytes (this is excluding the 8 IV bytes that are prepended during encryption). If it isn't, TrackMania will not read the last part of the metadata which is not rounded up, and instead try to parse garbage bytes – which will most likely result in a crash.

Tools

TMPakTool (@archive.org) - an open source tool which can open and edit .pak files in an Explorer-like interface. Comes with a C# library which you can use in your own applications to work with .pak files. Download & Source code
GbxDump - a Windows tool to dump and analyze the headers of .Challenge|Map.Gbx, .Replay.Gbx, .Pack.Gbx|.pak, .ObjectInfo.Gbx and .Item.Gbx files.
GBX Data Fetcher - a PHP module with classes to extract useful data from .Challenge|Map.Gbx files (including the thumbnail image), .Replay.Gbx files and .Pack.Gbx|.pak files, and parse their XML blocks.
Extract GBX data - a PHP script to format and print data from all .Gbx file types supported by the GBX Data Fetcher module.