Online release of Data-Oriented Design :
This is the free, online, reduced version. Some inessential chapters are excluded from this version, but in the spirit of this being an education resource, the essentials are present for anyone wanting to learn about data-oriented design.
Expect some odd formatting and some broken images and listings as this is auto generated and the Latex to html converters available are not perfect. If the source code listing is broken, you should be able to find the referenced source on github. If you like what you read here, consider purchasing the real paper book from here, as not only will it look a lot better, but it will help keep this version online for those who cannot afford to buy it. Please send any feedback to


Data-Oriented Design

Data-oriented design has been around for decades in one form or another but was only officially given a name by Noel Llopis in his September 2009 article[#!NoelDOD!#] of the same name. Whether it is, or is not a programming paradigm is seen as contentious. Many believe it can be used side by side with other programming paradigms such as object-oriented, procedural, or functional programming. In one respect they are right, data-oriented design can function alongside the other paradigms, but that does not preclude it from being a way to approach programming in the large. Other programming paradigms are known to function alongside each other to some extent as well. A Lisp programmer knows that functional programming can coexist with object-oriented programming and a C programmer is well aware that object-oriented programming can coexist with procedural programming. We shall ignore these comments and claim data-oriented design as another important tool; a tool just as capable of coexistence as the rest. 1.1

The time was right in 2009. The hardware was ripe for a change in how to develop. Potentially very fast computers were hindered by a hardware ignorant programming paradigm. The way game programmers coded at the time made many engine programmers weep. The times have changed. Many mobile and desktop solutions now seem to need the data-oriented design approach less, not because the machines are better at mitigating an ineffective approach, but the games being designed are less demanding and less complex. The trend for mobile seems to be moving to AAA development, which should bring the return of a need for managing complexity and getting the most out of the hardware.

As we now live in a world where multi-core machines include the ones in our pockets, learning how to develop software in a less serial manner is important. Moving away from objects messaging and getting responses immediately is part of the benefits available to the data-oriented programmer. Programming, with a firm reliance on awareness of the data flow, sets you up to take the next step to GPGPU and other compute approaches. This leads to handling the workloads that bring game titles to life. The need for data-oriented design will only grow. It will grow because abstractions and serial thinking will be the bottleneck of your competitors, and those that embrace the data-oriented approach will thrive.

It's all about the data

Data is all we have. Data is what we need to transform in order to create a user experience. Data is what we load when we open a document. Data is the graphics on the screen, the pulses from the buttons on your gamepad, the cause of your speakers producing waves in the air, the method by which you level up and how the bad guy knew where you were so as to shoot at you. Data is how long the dynamite took to explode and how many rings you dropped when you fell on the spikes. It is the current position and velocity of every particle in the beautiful scene that ended the game which was loaded off the disc and into your life via transformations by machinery driven by decoded instructions themselves ordered by assemblers instructed by compilers fed with source-code.

No application is anything without its data. Adobe Photoshop without the images is nothing. It's nothing without the brushes, the layers, the pen pressure. Microsoft Word is nothing without the characters, the fonts, the page breaks. FL Studio is worthless without the events. Visual Studio is nothing without source. All the applications that have ever been written, have been written to output data based on some input data. The form of that data can be extremely complex, or so simple it requires no documentation at all, but all applications produce and need data. If they don't need recognisable data, then they are toys or tech demos at best.

Instructions are data too. Instructions take up memory, use up bandwidth, and can be transformed, loaded, saved and constructed. It's natural for a developer to not think of instructions as being data1.2, but there is very little differentiating them on older, less protective hardware. Even though memory set aside for executables is protected from harm and modification on most contemporary hardware, this relatively new invention is still merely an invention, and the modified Harvard architecture relies on the same memory for data as it does for instructions. Instructions are therefore still data, and they are what we transform too. We take instructions and turn them into actions. The number, size, and frequency of them is something that matters. The idea that we have control over which instructions we use to solve problems leads us to optimisations. Applying our knowledge of what the data is allows us to make decisions about how the data can be treated. Knowing the outcome of instructions gives us the data to decide what instructions are necessary, which are busywork, and which can be replaced with equivalent but less costly alternatives.

This forms the basis of the argument for a data-oriented approach to development, but leaves out one major element. All this data and the transforming of data, from strings, to images, to instructions, they all have to run on something. Sometimes that thing is quite abstract, such as a virtual machine running on unknown hardware. Sometimes that thing is concrete, such as knowing which specific CPU and GPU you have, and the memory capacity and bandwidth you have available. But in all cases, the data is not just data, but data that exists on some hardware somewhere, and it has to be transformed by that same hardware. In essence, data-oriented design is the practice of designing software by developing transformations for well-formed data where the criteria for well-formed is guided by the target hardware and the patterns and types of transforms that need to operate on it. Sometimes the data isn't well defined, and sometimes the hardware is equally evasive, but in most cases a good background of hardware appreciation can help out almost every software project.

If the ultimate result of an application is data, and all input can be represented by data, and it is recognised that all data transforms are not performed in a vacuum, then a software development methodology can be founded on these principles; the principles of understanding the data, and how to transform it given some knowledge of how a machine will do what it needs to do with data of this quantity, frequency, and its statistical qualities. Given this basis, we can build up a set of founding statements about what makes a methodology data-oriented.

Data is not the problem domain

The first principle: Data is not the problem domain.

For some, it would seem that data-oriented design is the antithesis of most other programming paradigms because data-oriented design is a technique that does not readily allow the problem domain to enter into the software as written in source. It does not promote the concept of an object as a mapping to the context of the user in any way, as data is intentionally and consistently without meaning. Abstraction heavy paradigms try to pretend the computer and its data do not exist at every turn, abstracting away the idea that there are bytes, or CPU pipelines, or other hardware features, and instead bringing the model of the problem into the program. They regularly bring either the model of the view into the code, or the model of the world as a context for the problem. That is, they either structure the code around attributes of the expected solution, or they structure the code around the description of the problem domain.

Meaning can be applied to data to create information. Meaning is not inherent in data. When you say 4, it means very little, but say 4 miles, or 4 eggs, it means something. When you have 3 numbers, they mean very little as a tuple, but when you name them x,y,z, you can put meaning on them as a position. When you have a list of positions in a game, they mean very little without context. Object-oriented design would likely have the positions as part of an object, and by the class name and neighbouring data (also named) you can get an idea of what that data means. Without the connected named contextualising data, the positions could be interpreted in a number of different ways, and though putting the numbers in context is good in some sense, it also blocks thinking about the positions as just sets of three numbers, which can be important for thinking of solutions to the real problems the programmers are trying to solve.

For an example of what can happen when you put data so deep inside an object that you forget its impact, consider the numerous games released, and in production, where a 2D or 3D grid system could have been used for the data layout, but for unknown reasons the developers kept with the object paradigm for each entity on the map. This isn't a singular event, and real shipping games have seen this object-centric approach commit crimes against the hardware by having hundreds of objects placed in WorldSpace at grid coordinates, rather than actually being driven by a grid. It's possible that programmers look at a grid, and see the number of elements required to fulfil the request, and are hesitant to the idea of allocating it in a single lump of memory. Consider a simple 256 by 256 tilemap requiring 65,536 tiles. An object-oriented programmer may think about those sixty-five thousand objects as being quite expensive. It might make more sense for them to allocate the objects for the tiles only when necessary, even to the point where there literally are sixty-five thousand tiles created by hand in editor, but because they were placed by hand, their necessity has been established, and they are now something to be handled, rather than something potentially worrying.

Not only is this pervasive lack of an underlying form a poor way to handle rendering and simple element placement, but it leads to much higher complexity when interpreting locality of elements. Gaining access to elements on a grid-free representation often requires jumping through hoops such as having neighbour links (which need to be kept up to date), running through the entire list of elements (inherently costly), or references to an auxiliary augmented grid object or spatial mapping system connecting to the objects which are otherwise free to move, but won't, due to the design of the game. This fake form of freedom introduced by the grid-free design presents issues with understanding the data, and has been the cause of some significant performance penalties in some titles. Thus also causing a significant waste of programmer mental resources in all.

Other than not having grids where they make sense, many modern games also seem to carry instances for each and every item in the game. An instance for each rather than a variable storing the number of items. For some games this is an optimisation, as creation and destruction of objects is a costly activity, but the trend is worrying, as these ways of storing information about the world make the world impenetrable to simple interrogation.

Many games seem to try to keep everything about the player in the player class. If the player dies in-game, they have to hang around as a dead object, otherwise, they lose access to their achievement data. This linking of what the data is, to where it resides and what it shares lifetime with, causes monolithic classes and hard to untangle relationships which frequently turn out to be the cause of bugs. I will not name any of the games, but it's not just one title, nor just one studio, but an epidemic of poor technical design that seems to infect those who use off the shelf object-oriented engines more than those who develop their own regardless of paradigm.

The data-oriented design approach doesn't build the real-world problem into the code. This could be seen as a failing of the data-oriented approach by veteran object-oriented developers, as examples of the success of object-oriented design come from being able to bring the human concepts to the machine, then in this middle ground, a solution can be written that is understandable by both human and computer. The data-oriented approach gives up some of the human readability by leaving the problem domain in the design document, bringing elements of constraints and expectations into the transforms, but stops the machine from having to handle human concepts at any data level by just that same action.

Let us consider how the problem domain becomes part of the software in programming paradigms that promote needless abstraction. In the case of objects, we tie meanings to data by associating them with their containing classes and their associated functions. In high-level abstraction, we separate actions and data by high-level concepts, which might not apply at the low level, thus reducing the likelihood the functions can be implemented efficiently.

When a class owns some data, it gives that data a context which can sometimes limit the ability to reuse the data or understand the impact of operations upon it. Adding functions to a context can bring in further data, which quickly leads to classes containing many different pieces of data that are unrelated in themselves, but need to be in the same class because an operation required a context and the context required more data for other reasons such as for other related operations. This sounds awfully familiar, and Joe Armstrong is quoted to have said ``I think the lack of reusability comes in object-oriented languages, not functional languages. Because the problem with object-oriented languages is they've got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle."1.3 which certainly seems to resonate with the issue of contextual referencing that seems to be plaguing the object-oriented languages.

You could be forgiven for believing that it's possible to remove the connections between contexts by using interfaces or dependency injection, but the connections lie deeper than that. The contexts in the objects are often connecting different classes of data about different categories in which the object fits. Consider how this banana has many different purposes, from being a fruit, to being a colour, to being a word beginning with the letter B. We have to consider the problem presented by the idea of the banana as an instance, as well as the banana being a class of entity too. If we need to gain information about bananas from the point of view of the law on imported goods, or about its nutritional value, it's going to be different from information about how many we are currently stocking. We were lucky to start with the banana. If we talk about the gorilla, then we have information about the individual gorilla, the gorillas in the zoo or jungle, and the class of gorilla too. This is three different layers of abstraction about something which we might give one name. At least with a banana, each individual doesn't have much in the way of important data. We see this kind of contextual linkage all the time in the real world, and we manage the complexity very well in conversation, but as soon as we start putting these contexts down in hard terms we connect them together and make them brittle.

All these mixed layers of abstraction become hard to untangle as functions which operate over each context drag in random pieces of data from all over the classes meaning many data items cannot be removed as they would then be inaccessible. This can be enough to stop most programmers from attempting large-scale evolving software projects, but there is another issue caused by hiding the actions applied to the data that leads to unnecessary complexity. When you see lists and trees, arrays and maps, tables and rows, you can reason about them and their interactions and transformations. If you attempt to do the same with homes and offices, roads and commuters, coffee shops and parks, you can often get stuck in thinking about the problem domain concepts and not see the details that would provide clues to a better data representation or a different algorithmic approach.

There are very few computer science algorithms that cannot be reused on primitive data types, but when you introduce new classes with their own internal layouts of data, that don't follow clearly in the patterns of existing data-structures, then you won't be able to fully utilise those algorithms, and might not even be able to see how they would apply. Putting data structures inside your object designs might make sense from what they are, but they often make little sense from the perspective of data manipulation.

When we consider the data from the data-oriented design point of view, data is mere facts that can be interpreted in whatever way necessary to get the output data in the format it needs to be. We only care about what transforms we do, and where the data ends up. In practice, when you discard meanings from data, you also reduce the chance of tangling the facts with their contexts, and thus you also reduce the likelihood of mixing unrelated data just for the sake of an operation or two.

Data and statistics

The second principle: Data is the type, frequency, quantity, shape, and probability.

The second statement is that data is not just the structure. A common misconception about data-oriented design is that it's all about cache misses. Even if it was all about making sure you never missed the cache, and it was all about structuring your classes so the hot and cold data was split apart, it would be a generally useful addition to your programming toolkit, but data-oriented design is about all aspects of the data. To write a book on how to avoid cache misses, you need more than just some tips on how to organise your structures, you need a grounding in what is really happening inside your computer when it is running your program. Teaching that in a book is also impossible as it would only apply to one generation of hardware, and one generation of programming languages, however, data-oriented design is not rooted in just one language and just some unusual hardware, even though the language to best benefit from it is C++, and the hardware to benefit the approach the most is anything with unbalanced bottlenecks. The schema of the data is important, but the values and how the data is transformed are as important, if not more so. It is not enough to have some photographs of a cheetah to determine how fast it can run. You need to see it in the wild and understand the true costs of being slow.

The data-oriented design model is centred around data. It pivots on live data, real data, data that is also information. Object-oriented design is centred around the problem definition. Objects are not real things but abstract representations of the context in which the problem will be solved. The objects manipulate the data needed to represent them without any consideration for the hardware or the real-world data patterns or quantities. This is why object-oriented design allows you to quickly build up first versions of applications, allowing you to put the first version of the design document or problem definition directly into the code, and make a quick attempt at a solution.

Data-oriented design takes a different approach to the problem, instead of assuming we know nothing about the hardware, it assumes we know little about the true nature of our problem, and makes the schema of the data a second-class citizen. Anyone who has written a sizeable piece of software may recognise that the technical structure and the design for a project often changes so much that there is barely any section from the first draft remaining unchanged in the final implementation. Data-oriented design avoids wasting resources by never assuming the design needs to exist anywhere other than in a document. It makes progress by providing a solution to the current problem through some high-level code controlling sequences of events and specifying schema in which to give temporary meaning to the data.

Data-oriented design takes its cues from the data which is seen or expected. Instead of planning for all eventualities, or planning to make things adaptable, there is a preference for using the most probable input to direct the choice of algorithm. Instead of planning to be extendable, it plans to be simple and replaceable, and get the job done. Extendable can be added later, with the safety net of unit tests to ensure it remains working as it did while it was simple. Luckily, there is a way to make your data layout extendable without requiring much thought, by utilising techniques developed many years ago for working with databases.

Database technology took a great turn for the positive when the relational model was introduced. In the paper Out of the Tar Pit[#!TarPit!#], Functional Relational Programming takes it a step further when it references the idea of using relational model data-structures with functional transforms. These are well defined, and much literature on how to adapt their form to match your requirements is available.

Data can change

Data-oriented design is current. It is not a representation of the history of a problem or a solution that has been brought up to date, nor is it the future, with generic solutions made up to handle whatever will come along. Holding onto the past will interfere with flexibility, and looking to the future is generally fruitless as programmers are not fortune tellers. It's the opinion of the author, that future-proof systems rarely are. Object-oriented design starts to show its weaknesses when designs change in the real-world.

Object-oriented design is known to handle changes to underlying implementation details very well, as these are the expected changes, the obvious changes, and the ones often cited in introductions to object-oriented design. However, real world changes such as change of user's needs, changes to input format, quantity, frequency, and the route by which the information will travel, are not handled with grace. It was introduced in On the Criteria To Be Used in Decomposing Systems into Modules[#!OnTheCriteria!#] that the modularisation approach used by many at the time was rather like that of a production line, where elements of the implementation are caught up in the stages of the proposed solution. These stages themselves would be identified with a current interpretation of the problem. In the original document, the solution was to introduce a data hiding approach to modularisation, and though it was an improvement, in the later book Software Pioneers: Contributions to Software Engineering[#!OnTheCriteriaLate!#], D. L. Parnas revisits the issue and reminds us that even though initial software development can be faster when making structural decisions based on business facts, it lays a burden on maintenance and evolutionary development. Object-oriented design approaches suffer from this inertia inherent in keeping the problem domain coupled with the implementation. As mentioned, the problem domain, when introduced into the implementation, can help with making decisions quickly, as you can immediately see the impact the implementation will have on getting closer to the goal of solving or working with the problem in its current form. The problem with object-oriented design lies in the inevitability of change at a higher level.

Designs change for multiple reasons, occasionally including times when they actually haven't. A misunderstanding of a design, or a misinterpretation of a design, will cause as much change in the implementation as a literal request for change of design. A data-oriented approach to code design considers the change in design through the lens of understanding the change in the meaning of the data. The data-oriented approach to design also allows for change to the code when the source of data changes, unlike the encapsulated internal state manipulations of the object-oriented approach. In general, data-oriented design handles change better as pieces of data and transforms can be more simply coupled and decoupled than objects can be mutated and reused.

The reason this is so, comes from linking the intention, or the aspect, to the data. When lumping data and functions in with concepts of objects, you find the objects are the schema of the data. The aspect of the data is linked to that object, which means it's hard to think of the data from another point of view. The use case of the data, and the real-world or design, are now linked to the data layout through a singular vision implied by the object definition. If you link your data layout to the union of the required data for your expected manipulations, and your data manipulations are linked by aspects of your data, then you make it hard to unlink data related by aspect. The difficulty comes when different aspects need different subsets of the data, and they overlap. When they overlap, they create a larger and larger set of values that need to travel around the system as one unit. It's common to refactor a class out into two or more classes, or give ownership of data to a different class. This is what is meant by tying data to an aspect. It is tied to the lens through which the data has purpose, but with static typed objects that purpose is predefined, a union of multiple purposes, and sometimes carries around defunct relationships. Some purposes may no longer required by the design. Unfortunately, it's easier to see when a relationship needs to exist, than when it doesn't, and that leads to more connections, not fewer, over time.

If you link your operations by related data, such as when you put methods on a class, you make it hard to unlink your operations when the data changes or splits, and you make it hard to split data when an operation requires the data to be together for its own purposes. If you keep your data in one place, operations in another place, and keep the aspects and roles of data intrinsic from how the operations and transforms are applied to the data, then you will find that many times when refactoring would have been large and difficult in object-oriented code, the task now becomes trivial or non-existent. With this benefit comes a cost of keeping tabs on what data is required for each operation, and the potential danger of de-synchronisation. This consideration can lead to keeping some cold code in an object-oriented style where objects are responsible for maintaining internal consistency over efficiency and mutability. Examples of places where object-oriented design is far superior to data-oriented can be that of driver layers for systems or hardware. Even though Vulkan and OpenGL are object-oriented, the granularity of the objects is large and linked to stable concepts in their space, just like the object-oriented approach of the FILE type or handle, in open, close, read, and write operations in filesystems.

A big misunderstanding for many new to the data-oriented design paradigm, a concept brought over from abstraction based development, is that we can design a static library or set of templates to provide generic solutions to everything presented in this book as a data-oriented solution. Much like with domain driven design, data-oriented design is product and work-flow specific. You learn how to do data-oriented design, not how to add it to your project. The fundamental truth is that data, though it can be generic by type, is not generic in how it is used. The values are different and often contain patterns we can turn to our advantage. The idea that data can be generic is a false claim that data-oriented design attempts to rectify. The transforms applied to data can be generic to some extent, but the order and selection of operations are literally the solution to the problem. Source code is the recipe for conversion of data from one form into another. There cannot be a library of templates for understanding and leveraging patterns in the data, and that's what drives a successful data-oriented design. It's true we can build algorithms to find patterns in data, otherwise, how would it be possible to do compression, but the patterns we think about when it comes to data-oriented design are higher level, domain-specific, and not simple frequency mappings.

Our run-time benefits from specialisation through performance tricks that sometimes make the code harder to read, but it is frequently discouraged as being not object-oriented, or being too hard-coded. It can be better to hard-code a transform than to pretend it's not hard-coded by wrapping it in a generic container and using less direct algorithms on it. Using existing templates like this provides a benefit of an increase in readability for those who already know the library, and potentially fewer bugs if the functionality was in some way generic. But, if the functionality was not well mapped to the existing generic solution, writing it with a function template and then extending will make the code harder to understand. Hiding the fact that the technique had been changed subtly will introduced false assumptions. Hard-coding a new algorithm is a better choice as long as it has sufficient tests, and is objectively new. Tests will also be easier to write if you constrain yourself to the facts about concrete data and only test with real, but simple data for your problem, and not generic types on generic data.

How is data formed?

The games we write have a lot of data, in a lot of different formats. We have textures in multiple formats for multiple platforms. There are animations, usually optimised for different skeletons or types of playback. There are sounds, lights, and scripts. Don't forget meshes, they consist of multiple buffers of attributes. Only a very small proportion of meshes are old fixed function type with vertices containing positions, UVs, and normals. The data in game development is hard to box, and getting harder to pin down as more ideas which were previously considered impossible have now become commonplace. This is why we spend a lot of time working on editors and tool-chains, so we can take the free-form output from designers and artists and find a way to put it into our engines. Without our tool-chains, editors, viewers, and tweaking tools, there would be no way we could produce a game with the time we have. The object-oriented approach provides a good way to wrap our heads around all these different formats of data. It gives a centralised view of where each type of data belongs and classifies it by what can be done to it. This makes it very easy to add and use data quickly, but implementing all these different wrapper objects takes time. Adding new functionality to these objects can sometimes require large amounts of refactoring as occasionally objects are classified in such a way that they don't allow for new features to exist. For example, in many old engines, textures were always 1,2, or 4 bytes per pixel. With the advent of floating point textures, all that code required a minor refactoring. In the past, it was not possible to read a texture from the vertex shader, so when texture based skinning came along, many engine programmers had to refactor their render update. They had to allow for a vertex shader texture upload because it might be necessary when uploading transforms for rendering a skinned mesh. When the PlayStation2 came along, or an engine first used shaders, the very idea of what made a material had to change. In the move from small 3D environments to large open worlds with level of detail caused many engineers to start thinking about what it meant for something to need rendering. When newer hardware became more picky about alignment, other hard to inject changes had to be made. In many engines, mesh data is optimised for rendering, but when you have to do mesh ray casting to see where bullets have hit, or for doing IK, or physics, then you need multiple representations of an entity. At this point, the object-oriented approach starts to look cobbled together as there are fewer objects that represent real things, and more objects used as containers so programmers can think in larger building blocks. These blocks hinder though, as they become the only blocks used in thought, and stop potential mental connections from happening. We went from 2D sprites to 3D meshes, following the format of the hardware provider, to custom data streams and compute units turning the streams into rendered triangles. Wave data, to banks, to envelope controlled grain tables and slews of layered sounds. Tilemaps, to portals and rooms, to streamed, multiple levels of detail chunks of world, to hybrid mesh palette, props, and unique stitching assets. From flip-book to Euler angle sequences, to quaternions and spherical interpolated animations, to animation trees and behaviour mapping/trees. Change is the only constant.

All these types of data are pretty common if you've worked in games at all, and many engines do provide an abstraction to these more fundamental types. When a new type of data becomes heavily used it is promoted into engines as a core type. We normally consider the trade-off of new types being handled as special cases until they become ubiquitous to be one of usability vs performance. We don't want to provide free access to the lesser understood elements of game development. People who are not, or can not, invest time in finding out how best to use new features, are discouraged from using them. The object-oriented game development way to do that is to not provide objects which represent them, and instead only offer the features to people who know how to utilise the more advanced tools.

Apart from the objects representing digital assets, there are also objects for internal game logic. For every game, there are objects which only exist to further the game-play. Collectable card games have a lot of textures, but they also have a great deal of rules, card stats, player decks, match records, with many objects to represent the current state of play. All of these objects are completely custom designed for one game. There may be sequels, but unless it's primarily a re-skin, it will use quite different game logic in many places, and therefore require different data, which would imply different methods on the now guaranteed to be internally different objects.

Game data is complex. Any first layout of the data is inspired by the game's initial design. Once development is underway, the layout needs to keep up with whichever way the game evolves. Object-oriented techniques offer a quick way to implement any given design, are very quick at implementing each singular design in turn, but don't offer a clean or graceful way to migrate from one data schema to the next. There are hacks, such as those used in version based asset handlers, or in frameworks backed by update systems and conversion scripts, but normally, game developers change the tool-chain and the engine at the same time, do a full re-export of all the assets, then commit to the next version all in one go. This can be quite a painful experience if it has to happen over multiple sites at the same time, or if you have a lot of assets, or if you are trying to provide engine support for more than one title, and only one wants to change to the new revision. An example of an object-oriented approach that handles migration of design with some grace is the Django framework, but the reason it handles the migration well is that the objects would appear to be views into data models, not the data itself.

There have not yet been any successful efforts to build a generic game asset solution. This may be because all games differ in so many subtle ways that if you did provide a generic solution, it wouldn't be a game solution, just a new language. There is no solution to be found in trying to provide all the possible types of object a game can use. But, there is a solution if we go back to thinking about a game as merely running a set of computations on some data. The closest we can get in 2018 is the FBX format, with some dependence on the current standard shader languages. The current solutions appear to have excess baggage which does not seem easy to remove. Due to the need to be generic, many details are lost through abstractions and strategies to present data in a non-confrontational way.

What can provide a computational framework for such complex data?

Game developers are notorious for thinking about game development from either a low level all out performance perspective or from a very high-level gameplay and interaction perspective. This may have come about because of the widening gap between the amount of code that has to be high performance, and the amount of code to make the game complete. Object-oriented techniques provide good coverage of the high-level aspect, so the high-level programmers are content with their tools. The performance specialists have been finding ways of doing more with the hardware, so much so that a lot of the time content creators think they don't have a part in the optimisation process. There has never been much of a middle ground in game development, which is probably the primary reason why the structure and performance techniques employed by big-iron companies didn't seem useful. The secondary reason could be that game developers don't normally develop systems and applications which have decade-long maintenance expectations1.4 and therefore are less likely to be concerned about why their code should be encapsulated and protected or at least well documented. When game development was first flourishing into larger studios in the late 1990's, academic or corporate software engineering practices were seen as suspicious because wherever they were employed, there was a dramatic drop in game performance, and whenever any prospective employees came from those industries, they failed to impress. As games machines became more like the standard micro-computers, and standard micro-computers drew closer in design to the mainframes of old, the more apparent it became that some of those standard professional software engineering practices could be useful. Now the scale of games has grown to match the hardware, but the games industry has stopped looking at where those non-game development practices led. As an industry, we should be looking to where others have gone before us, and the closest set of academic and professional development techniques seem to be grounded in simulation and high volume data analysis. We still have industry-specific challenges such as the problems of high frequency highly heterogeneous transformational requirements that we experience in sufficiently voluminous AI environments, and we have the issue of user proximity in networked environments, such as the problems faced by MMOs when they have location-based events, and bandwidth starts to hit $ n^2$ issues as everyone is trying to message everyone else.

With each successive generation, the number of developer hours to create a game has grown, which is why project management and software engineering practices have become standardised at the larger games companies. There was a time when game developers were seen as cutting-edge programmers, inventing new technology as the need arises, but with the advent of less adventurous hardware (most notably in the x86 based recent 8thgenerations), there has been a shift away from ingenious coding practices, and towards a standardised process. This means game development can be tuned to ensure the release date will coincide with marketing dates. There will always be an element of randomness in high profile game development. There will always be an element of innovation that virtually guarantees you will not be able to predict how long the project, or at least one part of the project, will take. Even if data-oriented design isn't needed to make your game go faster, it can be used to make your game development schedule more regular.

Part of the difficulty in adding new and innovative features to a game is the data layout. If you need to change the data layout for a game, it will need objects to be redesigned or extended in order to work within the existing framework. If there is no new data, then a feature might require that previously separate systems suddenly be able to talk to each other quite intimately. This coupling can often cause system-wide confusion with additional temporal coupling and corner cases so obscure they can only be reproduced one time in a million. These odds might sound fine to some developers, but if you're expecting to sell five to fifty million copies of your game, at one in a million, that's five to fifty people who will experience the problem, can take a video of your game behaving oddly, post it on the YouTube, and call your company rubbish, or your developers lazy, because they hadn't fixed an obvious bug. Worse, what if the one in a million issue was a way to circumvent in-app-purchases, and was reproducible if you knew what to do and the steps start spreading on Twitter, or maybe created an economy-destroying influx of resources in a live MMO universe1.5. In the past, if you had sold five to fifty million copies of your game, you wouldn't care, but with the advent of free-to-play games, five million players might be considered a good start, and poor reviews coming in will curb the growth. IAP circumventions will kill your income, and economy destruction will end you.

Big iron developers had these same concerns back in the 1970's. Their software had to be built to high standards because their programs would frequently be working on data concerned with real money transactions. They needed to write business logic that operated on the data, but most important of all, they had to make sure the data was updated through a provably careful set of operations in order to maintain its integrity. Database technology grew from the need to process stored data, to do complex analysis on it, to store and update it, and be able to guarantee it was valid at all times. To do this, the ACID test was used to ensure atomicity, consistency, isolation, and durability. Atomicity was the test to ensure all transactions would either complete or do nothing. It could be very bad for a database to update only one account in a financial transaction. There could be money lost or created if a transaction was not atomic. Consistency was added to ensure all the resultant state changes which should happen during a transaction do happen, that is, all triggers which should fire, do fire, even if the triggers cause triggers recursively, with no limit. This would be highly important if an account should be blocked after it has triggered a form of fraud detection. If a trigger has not fired, then the company using the database could risk being liable for even more than if they had stopped the account when they first detected fraud. Isolation is concerned with ensuring all transactions which occur cannot cause any other transactions to differ in behaviour. Normally this means that if two transactions appear to work on the same data, they have to queue up and not try to operate at the same time. Although this is generally good, it does cause concurrency problems. Finally, durability. This was the second most important element of the four, as it has always been important to ensure that once a transaction has completed, it remains so. In database terminology, durability meant the transaction would be guaranteed to have been stored in such a way that it would survive server crashes or power outages. This was important for networked computers where it would be important to know what transactions had definitely happened when a server crashed or a connection dropped.

Modern networked games also have to worry about highly important data like this. With non-free downloadable content, consumers care about consistency. With consumable downloadable content, users care a great deal about every transaction. To provide much of the functionality required of the database ACID test, game developers have gone back to looking at how databases were designed to cope with these strict requirements and found reference to staged commits, idempotent functions, techniques for concurrent development, and a vast literature base on how to design tables for a database.

Conclusions and takeaways

We've talked about data-oriented design being a way to think about and lay out your data and to make decisions about your architecture. We have two principles that can drive many of the decisions we need to make when doing data-oriented design. To finish the chapter, there are some takeaways you can use immediately to begin your journey.

Consider how your data is being influenced by what it's called. Consider the possibility that the proximity of other data can influence the meaning of your data, and in doing so, trap it in a model that inhibits flexibility. For the consideration of the first principle, data is not the problem domain, it's worth thinking about the following items.


You are not targeting an unknown device with unknowable characteristics. Know your data, and know your target hardware. To some extent, understand how much each stream of data matters, and who is consuming it. Understand the cost and potential value of improvements. Access patterns matter, as you cannot hit the cache if you're accessing things in a burst, then not touching them again for a whole cycle of the application. For the consideration of the second principle, data is the type, frequency, quantity, shape, and probability, it's worth thinking about the following items.


Online release of Data-Oriented Design :
This is the free, online, reduced version. Some inessential chapters are excluded from this version, but in the spirit of this being an education resource, the essentials are present for anyone wanting to learn about data-oriented design.
Expect some odd formatting and some broken images and listings as this is auto generated and the Latex to html converters available are not perfect. If the source code listing is broken, you should be able to find the referenced source on github. If you like what you read here, consider purchasing the real paper book from here, as not only will it look a lot better, but it will help keep this version online for those who cannot afford to buy it. Please send any feedback to

Richard Fabian 2018-10-08