DATA-ORIENTED DESIGN The hardware will thank you.

Follow Data-oriented Design on Google+

Read articles on CellPerformance

Data-Oriented Design book - online beta


Far Cry 3 - SPU effectiveness 18/06/2012:13:40:13

In Praise of Idleness 18/06/2012:13:25:56

Check out this article by Bruce Dawson on the many various types of waiting.

Adding intrinsics to dart. 18/06/2012:13:25:04

It's never too late to add intrinsics to a language. If we're to continue using non-native languages in our browsers, adding in SIMD and other hardware oriented features will save us some energy / money. John McCutchan has done just this : bringing-simd-accelerated-vector-math

An introduction to lock-free programming 18/06/2012:12:05:34

Preshing writes well and about subjects that matter. See his post on what lock-free really means and how to get started here.

Things you must read. 13/06/2012:16:04:43

  • DATA-ORIENTED DESIGN by Noel Llopis.
    The game developer magazine article published in September 2009 that started it all.
  • Pitfalls of Object Oriented Programming by Tony Albrecht
    The PDF of slides that woke many game developers up to the potential problems being caused by continuing the trend of more and more Object-oriented code without considering the hardware.
  • Typical C++ bullshit by Mike Acton
    An interesting take on how to make a slide presentation with equally interesting content. Be sure to check out the rest of his smug mug gallery of tips on concurrent software design.

Data movement is what matters, not arithmetic 12/06/2012:16:31:56

November 1, 2006 lecture by William Dally for the Stanford University Computer Systems Colloquium (EE 380).

A discussion about the exploration of parallelism and locality with examples drawn from the Imagine and Merrimac projects and from three generations of stream programming systems.

Andrew Richards talks about OpenCL's future. 12/06/2012:16:05:10

The Web Will Die When OOP Dies 10/06/2012:21:41:21

Zed Shaw of presents a great talk about the web and how OOP is causing pain.

not just the shape or content of the data 

time: but also how it gets there.

input is data 04/05/2012:21:07:54

Very interesting read, but one of the understated takeaways is that playing a game is data. Playing a game means generating data from data. Lekktor took this principle to the extreme, it took the input of the player as the data by which is decided how to morph the code.

It's crazy, but it's something to think about. The data in the form of the player input counts. Player input was used to measure code coverage, and to some extent, this is why automated bot tests can return you bogus performance profiles. If using Lekktor was taken for granted, what would be necessary to make it not a crazy idea?

The first step could be to introduce unit tests of a different sort. For everything that the game can do, the unit test would make the game do it so Lekktor wouldn't forget about it. If someone finds a feature missing from the final game, then you missed a unit test. Also knowing that Lekktor won't let your code live without a unit test would provoke you into writing said test, which wouldn't be a bad thing at all now would it?

There are some other things to think about too: If a player is unlikely to do something, then we all know it's more likely to be buggy because it's less likely to be tested, but also, things that are less likely deserve less developer time. In turn this allows us to make trade offs. For example, it's seen as quite natural to ignore fixing framerate issues from areas that the player is unlikley to see in favour of fixing the framerate in areas where the player is highly likely to see. Lekktor allows us another view of the code. It can tell us what areas of the code are used little, and from that we can deduce what areas are potentially more dangerous than others.

During development, it's important to have all the optional but not actually used code paths available, but in a final build, not just the debugging code should be eradicated, but also all the code that was only used by the debug code. Lekktor could potentially be that tool, but only after all the crazy is taken out.

A slow realisation 01/05/2012:23:07:37

Chris Turner reveals how he realised over a number of years that the advertised features of Object Oriented design don't quite match up with the reality while taking on a more and more functional approach to development.

When you can change to match the data, you can be more efficient. 17/04/2012:15:07:50

remember time then doing analysis of space 08/04/2012:12:12:31

Some elements or development have time and space tied to each other in such a literal way that's it hard to think of a reason not to worry about both at the same time.

Take asset compression.

For physical media, there is a certain amount of compression required in order to fit your game on a disc. Without any compression at all, many games would either take up multiple discs, or just not have much content. With compression, load times go down and processor usage goes up. But, how much compression is wanted? There are some extremely high compression ratio algorithms around that would allow some multiple DVD titles to fit on one disc, but would it be worth investing the time and effort in them?

The only way to find out is to look at the data, in this case, the time it takes to load an asset from disc vs the time it takes to decompress it. If the time to decompress is less than the time to read, then it's normally safe to assume you can try a more complex compression algorithm, but that's only in the case where you have the compute resources to do the decompression.

Imagine a game where the level is loaded, and play commences. In this environment, the likelyhood that you would have a lot of compute resources available during the asset loading is very high indeed. Loading screens cover the fact that the CPUs/GPUs are being fully utilised in mitigating disc throughput limits.

Imagine a free roaming game where once you're in the world, there's no loading screens. In this environment, the likleyhood of good compute resources going spare is low, so decompression algorithms have to be light-weight and assets need to be built so that streaming content is simple and fast too.

Always consider how the data is going to be used, and also what state the system is in when it is going to use it. Testing your technologies in isolation is a sure fire way to give you a horrible crunch period where you try to stick it all together at the end.

Government of data 07/04/2012:16:26:27

The UK goverment website is promoting a new set of design principles that can just as well apply to many other forms of design where data needs to be understood in order to be consumed. Anyone creating tools to help visualise data in any field can take cues from this resource. The website itself attempts to conform to its own rules, minimising any effort on the part of the user, maintaining readability through scaling of glyphs and simple but active and non-intrusive elements in the page.

Separation of compute, control and transfer 21/03/2012:18:06:48

Every time you use an accumulator or a temporary variable, your potential for concurrency suddenly drops.

This short article goes over some of the higher level auto parallelising languages that attempt to leverage the power of GPGPU, but are hindered in scalability by their attempt to give the programmer what they are used to, not what they really need and no more.

Know your data 10/02/2012:20:05:04

... shall be the whole law

Michael Bosley posted to his blog, the importance of appreciating the data, not what, how big, or what values this time, but when. The time of arrival of the data, specific values of data, was causing an AVL tree to fall into one of it's worst case scenarios. Algorithmic complexity may have big O notation for quick ballpark estimations of which one to apply to a given situation, but you need to understand why and what causes the best and worst cases before you can assume you can know what is the best algorithm for your particular case.

Analysing data 03/02/2012:16:24:42

most of the time you will be analysing data produced by a machine, but when you're not, such as when you're working with assets, good data tools can be invaluable.

Google Refine gives you the tools to determine the information in malformed data. Almost by accident, Google Refine also teaches you how to understand the power of map-reduce, and how to use it to create the filters that generate the data you need.

High level Data-oriented design starts with going component oriented 23/01/2012:19:56:23

And to prove that it isn't just about throwing away your high level languages and embracing C, TyphonRT does data-oriented design in Java for Android.

see the website for even more:

Bit Twiddling Hacks 19/01/2012:11:54:29

A useful page full of code snippets. Many here are ripe for use in in-order, or superscalar, processors.

Hardware is directed by the demands of software ... thankfully 18/01/2012:12:41:20

it's all about the data in this one.

Many Core processors: Everything You Know (about Parallel Programming) Is Wrong! 17/01/2012:12:36:07

"if we are to successfully program manycore systems, is our cherished assumption that we write programs that always get the exactly right answers."

Another benefit to Data-Oriented design is that it's well prepared for inaccuracy.

Network programming is always harder when you try to lock down the network state in objects, and always simpler if you just keep all you know about what has happened in simple buckets of information. This tolerance for data that may lack accuracy lends itself well to working towards a future where information may be wrong, lacking, or late. Traditionally, serial programs or object oriented programs had very little they could do about missing dependencies or unverifiable data, however, a data-oriented approach allows you to take this into account. It's very plausible to add catch up or assumptions steps if you centre your coding around the ideas of concurrency and data flowing through transforms.

also : Watch the video, I love the comparison between good concurrency design and Romeo and Juliet.

Cyclomatic complexity 06/01/2012:18:19:41

the take away from this, is that inherently branchy development practices, the ones that make in-order processors cry, are inherently more defective.

Practice better programming style, find ways to reduce your if count.

Data organisation by domain knowledge 05/01/2012:09:24:17

Structures of arrays is better than arrays of structures, but sometimes, if you're accessing the arrays in a sparse manner, keeping the data you would need together is even better. Thus you end up with a structure of arrays of structures.

This can be found by profiling, but sometimes domain knowledge can leverage the most optimal solutions. In this example by Julien Hamaide he finds locality patterns based on the transforms that will be run on his data through profiling and through knowledge of the data relationships.

Things to consider : #08 Bad code is salvagable - bad data can be a much worse spaghetti to untangle 04/01/2012:14:02:11

The heart of good data design is knowing what the data is and how it relates to itself and the transforms through which it will pass. Normalisation is your buddy in many cases, but you need to extend it to encompass the hardware and the domain knowledge too.

Questions from a reader 03/01/2012:12:42:00

* Could a managed solution eventually draw closer to, or overtake, unmanaged code; can automatic detection of usage patterns help close the gap?

We've already seen how sometimes managed code can be faster than unmanaged with the controversial "Java faster than C" article. The reason behind this can only be attributed to performance tweaks that can be applied when the virtual machine's runtime compiler made data oriented optimisations. Data-oriented design can and will be leveraged by everyone that's interested in performance, not just those developing software products, but also the languages that other developers work with.
Usage patterns have often been seen as the holy grail of compiler analysis. The idea that a compiler can figure out what you meant, then implement it in the best way possible. This fairy tale is impossible in statically compiled code, as by definition it is compiled without access to the final data on which the process it run. Managed code can make runtime modifications to its processes, and thus has the opportunity to do realtime data directed optimisations.

Managed code is almost certainly going to to be faster than any naively written unmanaged implementation.

* How portable are these solutions? Are PS3, XBox, PC architectures too different to formulate a common solution? What additional development / maintenance cost does that introduce?

Part of the reason why data-oriented design has been seen as a controversial subject is that the x86 platform's design was influenced heavily by the object oriented programming paradigm. While the hardware we know and love/hate is reasonably good at processing object oriented branchy code, it's not because it's a good general purpose CPU design, but because it's been designed to run existing software faster than the competitor's CPUs.

If it were not for object-oriented design, we would probably be not having the "shift to parallel" discussions this late, nor would we have such large caches on our CPUs. We might not even have spent so much time developing technologies to their fullest such as the out-of-order processing hack, or the CPU resource sharing technique known as hyper-threading.

Because object oriented design tends to require some specific features of a computation system, it has decided the fate of many desktop CPU manufacturers design projects. Only outside desktop, where software is written for the hardware because there is only one customer, one platform, has there been good examples of leaving behind the features required by object-oriented design. The CELL BE is a good example of a piece of hardware that given the write technique, blows away the competition on a processing power per watt or dollar valuation. However, because the main target of the CELL BE was high business or games development, the bad press from the less adventurous developers caught on and has made the CELL out to be a failure.

As to portability of solutions, most data-oriented solutions are better because they concentrate on what needs to be done rather than what might be pretty to someone who likes to read abstractions or object-oriented code. This difference pays off on all platforms, from micro-controllers to quad core x86 machines. Most example code of how to write better code for the PPU or Xenon can be compiled for x86 and still see some marginal improvement. The reason the code isn't greatly better is that the x86 is very good at reducing the impact of Object-oriented code, but it can't fix it completely.

* Does DOD has a place outside game dev? Are games a special case, in that once released, the code is often not supported after a few patches? And you can't just throw more hardware at the solution with console dev.

Any place where resources are not limitless, DOD is preferable. Not only does it allow for simpler to reason about code, and also simpler to maintain due to an inherently transform oriented approach to problem solving, but it is faster and more efficient. There are hardly any areas of coding where being faster at no cost is something to be seen as a disadvantage.

DOD as a games development paradigm is a fallacy. Every minute you spend developing code to get around a language feature, or to make some code look nicer, or play well with others due to the unnecessary restrictions put in place by the object oriented language you are lumbered with, is a minute lost forever. There is no evidence that object-oriented design makes working on large projects easier, but there is evidence that it makes it harder (see large-scale C++ development for some eye-openers.) You only have to look to really big business to see that the 10Mloc+ projects are normally made up of smaller transforms, an inherently data-oriented approach to development, each with separate and specialised code, potentially running on separate and specialised hardware.

The only reason why DOD could be considered games development only is because games development doesn't naturally consider this standard way of solving data-transform problems in a sensible manner and needs a name to guide it back on track.