I came across this article on writing a really really fast json parser here
There are a few links in there that take you on a great tour of optimisations if you can spare the time to follow them, including a lovely post about 50% improvement in speed in SQLITE.
I was recently shown a video by Scott Meyers on CPU caches, and the first ten minutes alone seems to be a reasonable push for understanding that DOD affects all langauges, not just C++ or C.
Link to the video https://vimeo.com/97337258 here for your viewing pleasure
Manipulating data in structure of arrays format can be unweildy for some, but this post talks about making things easier for you using some simple templating to replace the manual side of iteration through the arrays.
Read here C++ encapsulation for Data-Oriented Design: performance and learn about keeping your DOD SoA approach tidier.
It has become more obvious to people involved in optimisation that the x86 architecture is a difficult platform to understand at the core. This is partially because of the multitude of different CPUs out there that support the instruction set, each with their different timings, but also because of this latest breed of extraordinarily out of order CPUs. Knowing what's actually going to happen in an i7 has become a near impossible task.
Read Robert Graham's post on x86 is a high-level language and try to see why it's so very difficult to grok the flow of data in these chips, and also how it's very difficult to guess what will be the best performing algorithm without doing a lot of real world tests.
Nice read on why grep is quick. Some simple stuff, some awesome algorithm usage, and generally the kind of thing that you might want to keep in your head for if you come across a searching pattern that is similar to grep in any way.
I was skeptical at first, but the author appears to have tested his efforts with real hardware, which of course is a core tenet of DOD. Also this is not a post about a new invention, but a set of results from tests where the author replaces a hash table with alternatives. It's interesting to look at the different timings, but remember to test your code and not just follow blindly, as you may have overhead somewhere else that makes the slowest in these tests, suddenly the fastest.
Chose a paradigm that allows for the simplest, least complex, most provably correct code.
Here's another example of premature optimisation:
Swap data for energy, and the demand oriented approach to fulfilment changes the function used to determine fitness. With energy, the demand over time was well known, but ignored by thousands of people installing expensive hardware.
A lovely book on optimisation by Carlos Bueno (from facebook's performance team).
Find the book available for free download here
As reference material for the book, a github project has been started to show the development of a game in both the Object-Oriented and Data-Oriented approaches.
Expect slow updates right now as it only has one developer and they are in full time employment at a startup so spare time is scarce. However, if you wish to follow along, the project is hosted here on github for all to see.
In addition to the parallel game development, submissions from other developers would be appreciated, specifically any demo code that provides ways to build timings for the performance oriented points of the book. For example, any code that could be used to directly show the impact of bad pipelining, bad cache alignment, or even the effects of write combining. The only rule will be that it has to be simple, and able to run on many platforms. Single platform statistics aren't much use unless they are targetting current trending hardware like ARM based CPUs.
When traversing objects stored on an intrusive linked list, it only takes one pointer indirection to get to the object, compared to two pointer indirections for std::list. This causes less memory-cache thrashing so your program runs faster — particularly on modern processors which have huge delays for memory stalls.