Particle man, particle man, doing the things a particle can

Particle man, particle man, doing the things a particle can

by baggers

Well it's been a good week.

The first thing I get working was a particle system. This was based on the techniques shown in here and from one of ferris' old demos.

The base technique is very simple:

  • A particle system has two gbuffers
  • each gbuffer has:
    • a 1024x1024 rgb32f texture which holds the position of each particle
    • a 1024x1024 rgb16f texture which holds the velocity of each particle
    • a flag to say which gbuffer is source for this frame (the other will be considered the destination)

You then run 1 shader that:

  • takes the velocity from the current gbuffer and updates it, storing the result in the destination gbuffer
  • takes the position from the current gbuffer and adds the current velocity to it

And another shader that draws 1048576 quads using the positions from the current gbuffer

Finally you swap the source & destination gbuffer.

The result is 1 million particles running very smoothly, I'm not sure of the final fps as my machine was capped at 60fps. You can see the result in the picture attached, yay!

This was cool and will serve as a great base for a particle system that actually looks pretty :) however there was one fly in the ointment, my code the create the initial textures and streams was disgustingly slow.

Revisiting the code behind my abstraction over c-arrays made it very clear just how much I have learned since I started this project. I was throwing away performance and memory all over the shop!

Ok so a little background, cepl (my lisp abstraction over gl) needs to send lots of data to the gpu. As opengl is a C library we talk to it through cffi (the common foreign function interface), this lets us so all the usual things an ffi does, allocate 'C memory' call C functions etc.

In cepl I have a c-array type which holds a pointer, the dimensions of the array and the datatype of the elements. I then make functions that feel very lispy to interact with this c-array, and use cffi to do the work behind the scenes. Common lisp is a dynamic language, but it is one that compiles to machine code and has a bunch of ways to specify types and meta-data the compiler can use to optimize your code, for certain classes of problems it can outperform C (but at that point you are HEAVILY annotating your code)

One big place I was throwing away performance was how I was converting lisp data to C data. The conversion functions have an argument for specifying the C type of the data, I was providing this, but at runtime, so the library (and compiler) had not chance to hardcode the call to the correct conversion function. This meant HUGE numbers of type lookups. This was fix by generating functions with the types hardcoded and storing these new lookup functions with the c-array.

Next I was allocating massive amounts of memory because I had written the code without thinking about performance. This is a fine thing to do if it is then easy to refactor into performant code. This was turning out to be a little ugly though, the number of nested loops was increasing. To combat this I decided to make functions to map across the c-arrays so now I have:

  • map-c takes a c-array and a function, it calls the function on every element in the c-array and returns a new c-array
  • map-c-into takes a source c-array, a destination c-array, and a function. It calls the function on every element in the c-array and stores the result in the destination c-array.
  • across takes a c-array and function. It then calls the function passing the c array and the indices of the element it is currently visiting. You could then destructively modify the c-array if you want to
  • across-ptr takes a c-array and function. It then calls the function passing the the pointer and the indices of the element it is currently visiting.

With these it is super easy to make functions that modify the C memory without converting between lisp & c data more than necessary.

With these done I got the time to generate and populate the textures and buffers from around 20 seconds to about 0.8 seconds. That was a damn good start :D

However I'm now in the mood for optimizing and I'm noticing I'm still allocating more memory than I could be. After a lot of digging it turns out that this may be related to the dispatch of the generic-functions (kind like methods) used in converting types in cffi. Luckily they have a way of telling the compiler how to optimize this and so now I have the C struct -> lisp data conversion to be much more memory efficient (and plenty faster too). However the feature to optimize the lisp data -> C struct is not currently implemented.

So now my next task is to add this feature to that library and see if they accept it into the project. It would be awesome if they do as cffi has made my entire cepl project possible, and to get my code into there would feel great.

Right, that's all for now, time to go cook food.


Last Edited on Sun Mar 13 2016 14:17:58 GMT-0400 (EDT)

bau5 commented

Hold on.... Theme song Ok I'm ready!

on Mon Mar 14 2016 12:12:12 GMT-0400 (EDT)

bau5 commented

This sounds pretty awesome! And kudos on that major performance improvement through all that you've learned 👍 It's always so satisfying to look back at an old project when you weren't as familiar with the language and you can see how much you've improved.

Anyways, I'd love to see a vidya (or gif) of this in action! Nicely done!

on Mon Mar 14 2016 12:17:17 GMT-0400 (EDT)

Zach Harvest commented

I had a friend come in the room while I was on skype with Jake, and I was talking about blender and my friend asked Jake what he was doing. And more or less Jake replied with something like "Yeah, I'm making something like blender" and that just blew my friends mind lol. That's how I feel when I'm working with particles and your making a particle engine :P

on Mon Mar 14 2016 15:32:11 GMT-0400 (EDT)

ferris commented



on Mon Mar 14 2016 22:39:08 GMT-0400 (EDT)