DATA-ORIENTED DESIGN The hardware will thank you.
If you wish to submit an article, please contact email@example.com for details.
High level Data-oriented design starts with going component oriented 23/01/2012:19:56:23And to prove that it isn't just about throwing away your high level languages and embracing C, TyphonRT does data-oriented design in Java for Android.
see the website for even more: http://www.typhon4android.org/
A useful page full of code snippets. Many here are ripe for use in in-order, or superscalar, processors.
Many Core processors: Everything You Know (about Parallel Programming) Is Wrong! 17/01/2012:12:36:07"if we are to successfully program manycore systems, is our cherished assumption that we write programs that always get the exactly right answers."
Another benefit to Data-Oriented design is that it's well prepared for inaccuracy.
Network programming is always harder when you try to lock down the network state in objects, and always simpler if you just keep all you know about what has happened in simple buckets of information. This tolerance for data that may lack accuracy lends itself well to working towards a future where information may be wrong, lacking, or late. Traditionally, serial programs or object oriented programs had very little they could do about missing dependencies or unverifiable data, however, a data-oriented approach allows you to take this into account. It's very plausible to add catch up or assumptions steps if you centre your coding around the ideas of concurrency and data flowing through transforms.
also : Watch the video, I love the comparison between good concurrency design and Romeo and Juliet.
the take away from this, is that inherently branchy development practices, the ones that make in-order processors cry, are inherently more defective.
Practice better programming style, find ways to reduce your if count.
Structures of arrays is better than arrays of structures, but sometimes, if you're accessing the arrays in a sparse manner, keeping the data you would need together is even better. Thus you end up with a structure of arrays of structures.
This can be found by profiling, but sometimes domain knowledge can leverage the most optimal solutions. In this example by Julien Hamaide he finds locality patterns based on the transforms that will be run on his data through profiling and through knowledge of the data relationships.
Things to consider : #08 Bad code is salvagable - bad data can be a much worse spaghetti to untangle 04/01/2012:14:02:11http://www.reddit.com/r/PHP/comments/n5qsz/tips_for_developing_large_applications/c36i1yq
The heart of good data design is knowing what the data is and how it relates to itself and the transforms through which it will pass. Normalisation is your buddy in many cases, but you need to extend it to encompass the hardware and the domain knowledge too.
* Could a managed solution eventually draw closer to, or overtake, unmanaged code; can automatic detection of usage patterns help close the gap?
We've already seen how sometimes managed code can be faster than unmanaged with the controversial "Java faster than C" article. The reason behind this can only be attributed to performance tweaks that can be applied when the virtual machine's runtime compiler made data oriented optimisations. Data-oriented design can and will be leveraged by everyone that's interested in performance, not just those developing software products, but also the languages that other developers work with.
Usage patterns have often been seen as the holy grail of compiler analysis. The idea that a compiler can figure out what you meant, then implement it in the best way possible. This fairy tale is impossible in statically compiled code, as by definition it is compiled without access to the final data on which the process it run. Managed code can make runtime modifications to its processes, and thus has the opportunity to do realtime data directed optimisations.
Managed code is almost certainly going to to be faster than any naively written unmanaged implementation.
* How portable are these solutions? Are PS3, XBox, PC architectures too different to formulate a common solution? What additional development / maintenance cost does that introduce?
Part of the reason why data-oriented design has been seen as a controversial subject is that the x86 platform's design was influenced heavily by the object oriented programming paradigm. While the hardware we know and love/hate is reasonably good at processing object oriented branchy code, it's not because it's a good general purpose CPU design, but because it's been designed to run existing software faster than the competitor's CPUs.
If it were not for object-oriented design, we would probably be not having the "shift to parallel" discussions this late, nor would we have such large caches on our CPUs. We might not even have spent so much time developing technologies to their fullest such as the out-of-order processing hack, or the CPU resource sharing technique known as hyper-threading.
Because object oriented design tends to require some specific features of a computation system, it has decided the fate of many desktop CPU manufacturers design projects. Only outside desktop, where software is written for the hardware because there is only one customer, one platform, has there been good examples of leaving behind the features required by object-oriented design. The CELL BE is a good example of a piece of hardware that given the write technique, blows away the competition on a processing power per watt or dollar valuation. However, because the main target of the CELL BE was high business or games development, the bad press from the less adventurous developers caught on and has made the CELL out to be a failure.
As to portability of solutions, most data-oriented solutions are better because they concentrate on what needs to be done rather than what might be pretty to someone who likes to read abstractions or object-oriented code. This difference pays off on all platforms, from micro-controllers to quad core x86 machines. Most example code of how to write better code for the PPU or Xenon can be compiled for x86 and still see some marginal improvement. The reason the code isn't greatly better is that the x86 is very good at reducing the impact of Object-oriented code, but it can't fix it completely.
* Does DOD has a place outside game dev? Are games a special case, in that once released, the code is often not supported after a few patches? And you can't just throw more hardware at the solution with console dev.
Any place where resources are not limitless, DOD is preferable. Not only does it allow for simpler to reason about code, and also simpler to maintain due to an inherently transform oriented approach to problem solving, but it is faster and more efficient. There are hardly any areas of coding where being faster at no cost is something to be seen as a disadvantage.
DOD as a games development paradigm is a fallacy. Every minute you spend developing code to get around a language feature, or to make some code look nicer, or play well with others due to the unnecessary restrictions put in place by the object oriented language you are lumbered with, is a minute lost forever. There is no evidence that object-oriented design makes working on large projects easier, but there is evidence that it makes it harder (see large-scale C++ development for some eye-openers.) You only have to look to really big business to see that the 10Mloc+ projects are normally made up of smaller transforms, an inherently data-oriented approach to development, each with separate and specialised code, potentially running on separate and specialised hardware.
The only reason why DOD could be considered games development only is because games development doesn't naturally consider this standard way of solving data-transform problems in a sensible manner and needs a name to guide it back on track.
Adventures in data-oriented design : part 1 02/01/2012:00:23:12As with many other examples of data-oriented design Stefan Reinalter's post starts with an object-oriented version, then produces a data-oriented version from it. He includes his results and even points out that long term C++ programmers that push for the STL appreciate the impact of cache ignorant implementations.
Moving away from a flow controlled programming style and towards a more stream processing technique has been seen as a good direction by Data-Orietned Design aficionados, but this post about how FPGAs will be running OpenCL offers a truly revolutionary reason why you should move away from the random access techniques of Object Oriented Design, and embrace the data-in data-out method of developing.
John McCutchan finalises his series of posts on data-oriented programming with an article on linked lists, showing how intrusive linked lists can be a big win.
Things to consider : #07 - Wasted Cycles 03/12/2011:15:49:26Wasted cycles are ones where the CPU does nothing, not cycles that produce results that are not used.
c = a > b ? a : b;
like this, the code is simple and wastes few cycles
c = 500msQuery() ? 500msCalculationIfTrue() : 500msCalculationIfFalse();
In this case, we don't determine which function to call until the expensive query has finished. This line uses the least number of cycles to produce a result in c, but isn't fast.
bQuery = 500msQuery()
tValue = 500msCalculationIfTrue();
fValue = 500msCalculationIfFalse();
c = bQuery ? tValue : fValue;
In this example, it can be possible to execute all three functions in parallel, only joining in the final use when combining into a value to be assigned to c. This technique uses 50% more cycles, but gets done in 50% the time.
This technique is not always appropriate, don't forget to think about the numbers involved. This technique is also known as predication.
Seperation of schema from responsibilities 01/12/2011:17:07:14In object oriented design, there is a strong tendency to implement the set of responsibilities in the same place as the schema of the data. In C++ this is caused by declaring the internal representation and external interface in the same place. The pImpl idiom can mitigate some of the inherent problems in this approach, but in general, thinking about data as an object consistently deflects criticism that the programmer is keeping too much information available when it's not used.
For example, the this pointer points to all the members of an object, not just the ones necessary for the member function. Attempting to reduce the available items to only those necessary for the transform cuts data members from the object. This is normally impossible due to each object's member functions having disjoint sets of considerations. To make the this pointer only point to pertinent data and no more, we have to completely change how member functions of a class take their responsibilities and turn them into access patterns. This is a non-trivial change.
C++ classes inherently couple data members of a class to each other via their member functions and member functions are coupled to each other by the data they operate on. To reduce coupling, you have to invoke higher level lower performance features of the language.
In conclusion, keeping your data in collections of actually related data is better than keeping them in collections of related by owner. In many cases, game engines already separate out parts of entities into different pieces such as physics, rendering, gameplay, and audio entities. The optimal solution is somewhere between this and every member of every entity being accessed purely by foreign key.
Succinct explanation of why and when to use SoA 01/12/2011:15:03:21As always, there are no hard and fast structural rules with data-oriented. The only rule seems to be to consistently confirm you're not just guessing that it's better through profiling after pontificating.
Stream processing can be described as having your transforms restricted to an input stream, output stream, and a kernel. This way of transforming data is well proven as a high throughput technique. It is very high latency, but for graphics transforms the latency is unimportant compared to throughput, so it's a massive win.
But, there are times when stream processing latency becomes a problem, such as when you know there is only one element in the pipe.
When data oriented design talks about knowing the numbers, the values, and the shape of your data, it often suggests you turn your serial code that does one at a time processing into the stream processing model. But the important point is that it is only often, not always, so you still have to look at your data.
The fastest stream processing engine in the world, is still going to be slower if there is only ever one element in the pipe. The cutoff is not even two items, it's some number that relies on a combination of different things. Always profile before you make the switch, and then after, unless the numbers are massive, or minuscule.
You will never be able to figure out how best to fit your data into the right shape if you don't know what you're working with. Alex Darby (@darbotron) writes about the important difference between the fundamental data types in C/C++ and the intrinsic types of your hardware.
When thinking about the best way to transform your data, sometimes you need a random store, such as new or malloc, but these solutions aren't always the best fit for your problem.
Design patterns, or to give them their full name Elements of Reusable Object Oriented Software, have become entrenched in the minds of many othewise excellent programmers.
So, tell them to start thinking.
Design patterns fall into the category of educational books that tell you what to think, not how to think. This dangerous type of education goes out of date quickly, and in the case of design patterns, was out of date even when it was released as we already had languages that didn't, and never will, need the advice given.
Read this: http://perl.plover.com/yak/design/
And start thinking for yourself.
In order to know how best to orient your data, it's important to know what the hardware will do with it. Hazards can cause incorrect computation, and to get around that potential thread, in-order CPUs manifest workarounds that produce pipeline bubbles and stalls.
We don't normally have to worry to much about read after write data hazards as most compilers will do their best to organise code so that the latency between the operations is filled with other work, but sometimes you need to think about what you're doing otherwise you can end up making it impossible for the compiler to overlay these otherwise uncoupled threads of execution.
The most obvious hazard is the control hazard, where the branch operation requires information and won't be able to prefetch data as it doesn't know which instructions are going to be executed. Branch prediction will start processing one branch, but as soon as the result comes in, if it doesn't match the currently processing branch, it will flush the pipeline, causing in effect a massive bubble of wasted pipeline slots. So, if you can, try to use use branches a long time after you calculate the predicate, such as in shader model branching where both branches of processing are run, and which branch, including which result, is only decided later. See single assignment form for more information on how this is achieved.
read more here:
and static single assignment form:
Warrick Buchanan's N step plan for data oriented thinking 15/11/2011:23:07:56originally posted on sweng-gamedev
Perhaps this summary may aid yours and others thinking:
1) Never forget the CPU loads data from memory transforms it (in Acton-speak) and writes the modified data out
2) It doesn't transform just one input and produce one output - it processes streams of data whether or not you choose to see it that way
3) The natural unit of program composition is not a conceptional 'OO' object but functions that transform an input stream to an output stream (stream or list)
4) It sounds simple and obvious and it is - but decades of 'OO' teaching have obscured this fact in many peoples minds
5) Functional decomposition leads to extremely reusable code without the awkwardness OO constructs can introduce
6) By functional decomposition I mean your function only reads its input and writes its output and touches no hidden shared state inside
7) If you view the input and output parameters that are connecting your functions as asynchronous streams rather than instant value parameters I would suggest writing concurrent code becomes much easier
8) Such parameter streams can be implemented as message passing lockless queues (Forget STM) and functions can be invoked when they have parameters to process by kicking them to a pool of worker threads
9) Push your iterations on the parameter streams down to the lowest level function they can go and you get the 'Action: 'Where there is one there is many'
10) Notice how tuples of parameter stream data can be efficiently processed by SSE instructions etc
11) Investigate what this means in respect to languages such as Erlang, Haskell, C#, F#, C++, C and understand why in lots of ways staying closer to the C subset of C++ in OO constructs is a good thing (Templates on such functions are great tools also)
12) View your input parameters as immutable data structures
13) Optimise your immutable data structures by performing copies only when necessary and providing garbage collection for them and transmission support for distributed computing (Understand how this maps to functional language compiler optimisations) - and how this can collapse to simple double buffer schemes in cases etc
14) Notice how efficiently a component/aspect/state/whatever entity system can be implemented in tandem with this and how well this maps to relational database tables (and ignore object orientated databases from now on)
15) Organise data into tables of actual data and sets of different indices into that data - and think in terms of queries, adds/removes and updates
16) Just have structs of data with the only functions allowed being initialisers and those that can compute values from the data in an immutable fashion
17) Read up on dataflow/flow based programming and model your programs by data flow and transforms on the data rather than ANY CLASS HIERARCHY AT ALL
18) Put this into practice in your C++ projects by using only static functions of the form described utilising the grouping features of namespaces and then if a naturally easy to use object orientated hierarchy fits nicely on top of certain collections of functions then create OO objects that utilise your existing static functions. Notice how this probably fits lots of the advice of good OO programming.
I hope that helps!
According to Mike Acton of Insomniac Games, quick sort is not a concurrent algorithm.
And yet many sorts are parallelisable, such as the bitonic sort:
the radix sort:
and even the merge and quick sort.
The point of the article is not that they're not ready for parallel processing, but that they're not concurrent. The difference between concurrent and parallel is best exemplified in Mike's earlier post on insert/delete/search
Concurrency is about systems that work at the same time, not simply about large scale serial processes that can have their workload shared out onto parallel hardware for better performance.
Object Oriented Programming makes your life hard 12/11/2011:15:53:47Procedural programming and functional progamming don't really make it easier, but they don't make it harder. There are plenty of pitfalls of OOP, and as with any religion, the zealots will forgive cases like this while always remembering that one program they touched ten years ago that had some globals and free functions that made their life a misery.
One more reason why C++ classes can cause problems 12/11/2011:13:34:52the static keyword has three different meanings based on how it's used.
Though this post
doesn't appear to be written by someone who knows why these things are true, the first item:
#1. Declare as static functions that are not used outside the file where they are defined
Declaring a function as static forces an internal linkage within that file, which can improve the performance of the code. Functions that are not declared as static default to external linkage, which may inhibit certain optimizations, such as aggressive inlining, with some C/C++ compilers.
This implies that because all class member functions are global scope, they potentially suffer the same optimisation limitation. Good old C++.
Also, ignore point 5, you should always be using bitwise operators for such trivial logic.
Some people optimise code by guesswork, which is possibly why a lot of people recite "premature optimisation is the root of all evil". There is no good way past the obvious platform specific hints to optimise a codebase unless you first know how much time is being spent where. You must profile. You must gather data.
In these posts by Drew Thaler evidence is used to find the real culprit. Without evidence, in both cases, the assumptions would have blocked the real reason from sight forever.