Parsing 3D file formats

Nov. 18, 2011

I am writing a game (no link yet) and I need some assets. So the obvious thing to do is to parse some 3D file format and get the data out of it that I need. I have tried a couple of formats and implemented some, and the following article describes the file formats I've encountered and tried to use.



Use the 3DS file format.


I have published the incomplete Milkshape 3D and the fairly usable 3D Studio parser on github.

Note that the files implement a conversion to json of the input data. You can see this as an example of how to interprete these files.

Count me in - 3D Studio Format

The 3D Studio files (3ds) are a binary file format, so you have to read a reference to implement it. Unlike the Milkshape 3D format however it represents a tree, kind of like JSON or XML, just binary. A big advantage of the tree structure is that you can ignore the parts you don't understand by simply jumping over them.

Unlike Collada or X files, 3ds files are not completely generic. They are very concrete, so making sense of the data in them is fairly easy.


Milkshape 3D - Is ok

Milkshape 3D (ms3d) files are not bad. They have the essential bits you require for games. Yes they are binary files but they are fairly straightforward to parse.

I have a rudimentary parser for MS3D files in my github repository.

The biggest drawbacks of this format for me which made me not use it are:

  • A minor concern is that it's a bit messy to parse. Unless you understand how to parse most of it, you can't make use of the files.
  • Object centers are not stored with objects but in bones.
  • Many programs exporting Milkshape 3D files don't understand this and bake all vertices and you loose object center information.


Collada - the XML format from hell

I've parsed Collada in the past. I won't go into details on how to do it, other then that you should not.

The good

Many programs can import and export collada. The low-level format is XML (I'm no fan of that either) so you don't have to muck about with implementing an actual tokenizer/parser etc.

The bad

Collada can do everything and the kitchen sink. In fact I suspect it is a Turing complete file format. It is not simple to interpret. A game usually just needs a few simple bits of data, but getting them out of a Collada file involves multiple indirections and jumps around the entire data structure just to get the value of one vertex.

Since Collada is completely generic, any usage you make of it is riddled with hacks to make it work for your non generic use-case. Anytime you present your interpretation of Collada with a file from a different author or saved in a different 3D modeling program, it breaks, usually in a way requiring you to spend hours or days figuring out how to add a new hack to make it work.

There is a reference implementation in Java of the Collada file format. However that implementation frequently fails to import or export files. Blender uses that reference implementation, and you can get 3D Studio Max to use it. But the frequent breakage even in the reference implementation is a major pain.

Direct3D X - and I'm like o_O

I have not implemented a working parser for this file format. I've given up shortly after trying to tokenize it. The file format is kind of like all the disadvantages of collada combined with no standard way to parse it and no open reference implementation.

You see X implements its own file format grammar as part of the file (they call these sections "template"). Unless you parse the grammar first, you can't parse the file. It's an insane approach to a file format, which is why most tutorials and examples showing you how to parse X files tell you to ignore that and just implement the few rules that Microsoft includes as defaults and ignore the rest.

Of course invariably 3D modeling programs export custom rules, so you end up implementing a growing number of hacks for every possible tool that can touch X files.