Object Pooling in my Lua UI Library

I’m working on a UI library. It’s not ready for a 1.0 release yet, but I’ve slowly started opening things up to get feedback while it’s still easy for me to make major changes. I recently posted an update video about adding image widgets.

I’ve been rather reckless with memory management to this point. I freely allocate memory when I need it and drop it on the floor when I’m done with it, leaving it for the garbage collector. I’ve had plenty of painful experiences with garbage collector pauses in games in the past, so I’ve been concerned this could be a pain point in the future.

However, I’ve noticed I’ve had far fewer problems with garbage collectors recently. I suspect there’s multiple reasons for this, not least of which is the tendency towards incremental garbage collectors in game engines. They spread out the collection work over time, making it less likely to blow a frame’s time budget.

Regardless, developers often use techniques to reduce the need for garbage collection. If your code produces less garbage, there’s less pressure to collect it. Object pools can be used for this. When you’re done with an object, instead of dropping it for the GC, you instead put it into a reuse pool. Then, when you need a new object, you can recycle an existing object out of the pool, instead of allocating an entirely new one. You can save time spent allocating new memory, and especially save time spent in garbage collection cycles.

But object pools have their own tradeoffs. Most significantly, they add complexity. It’s an entire application-space system that doesn’t otherwise exist. Users are responsible for adding objects to the pool when they’re done. Objects don’t come out of the pool freshly initialized, so care must be taken to clean them as necessary. Finally, managing the pool itself takes CPU time.

That last point was a particular open question for me. Garbage collectors have a performance cost, of course, but they’re a low-level system that have been heavily optimized over time, and in my experience they’re quite good now. By using object pools, I would be adding my own bespoke layer of memory management, implemented in a dynamically-typed language. Would I actually outperform the GC?

So I didn’t want to jump in haphazardly. I only wanted to use object pools if I could demonstrate to myself that they actually yielded real improvements. So I waited until I had a big enough sample project that I could capture meaningful measurements.

The Implementation

My implementation is about as simple as it gets. It offers two methods: iui.get and iui.put. The pool supports multiple object types. To distinguish types, you pass typename to iui.get. If the sub-pool for a type is empty, it’ll generate a new object, otherwise it’ll return an available object. When you’re done with an object, you just pass it to iui.put, and it’ll assign it to the correct sub-pool.

There’s one potential gotcha: iui.get assigns typename to the object’s _typename key. This key will appear if you try to iterate the object via pairs, so that’s something to be mindful of.

Every type pool also tracks an index into its internal storage, called top. Originally, I pushed objects onto the pool using a simple table.insert, and popped them using table.remove. Doing that requires the table to figure out the relevant index itself, based on the size of the pool. However, finding the length of an array-style table has an O(log n) time complexity. That is very fast, but it’s additional time spent that can be saved if it can be avoided. By tracking the index myself, I can point the operations directly at the correct location. I measured some additional performance doing this, so it seems worth it.

Pooling Draw Calls

Though I do temporary allocations all over the library, there’s one thing responsible for the majority: draw calls.

Widgets in the library perform both behavior and presentation in one function. However, frameworks like LÖVE and LÖVR separate update and draw into two separate functions. To reconcile this, the graphics commands in a widget need to be deferred until the draw phase.

Previously, I would wrap the graphics commands in an anonymous function, which would then be pushed onto a queue for deferred execution, by calling iui.draw.

    iui.draw.pushClip(bx, by, bw, bh)
end

-- Every one of these anonymous functions would create a
-- temporary heap allocation, and every call to every
-- widget did this.
iui.draw(function()
    iui.graphics.setColor(1, 1, 1)
    iui.graphics.image(image, filter, ox, oy, ow, oh)
end)

if clip then
    iui.draw.popClip()

widgets/image.lua, old

This worked, but those anonymous functions would require heap allocations to close over their state. So every widget would cause a temporary allocation, which would have to be collected by the GC later.

Now, these closures aren’t a good candidate for the object pool, but I had a different strategy in mind. I could make the individual graphics calls themselves create command objects that would go on a queue for later execution. Those command objects could use the object pool. As an added bonus, the iui.draw call would go away completely, leaving a more straightforward programming model.

    iui.draw.pushClip(bx, by, bw, bh)
end

-- It *looks* like we're drawing here and now, but actually
-- these graphics APIs now create and enqueue deferred
-- command objects.
iui.graphics.setColor(1, 1, 1)
iui.graphics.image(image, filter, ox, oy, ow, oh)

if clip then
    iui.draw.popClip()

widgets/image.lua, new

Importantly, this is not exactly a one-to-one switch from temporary allocations to object pools. We’re getting rid of the anonymous function for iui.draw entirely, while the iui.graphics APIs are now acquiring, configuring, and enqueueing draw command objects, instead of just drawing directly. So the system would involve fewer temporary allocations, but more objects overall. I felt confident I would see a big drop in garbage, but would it be faster?

Results

Object pooling worked!

In my sample app, I saw temporary memory allocations drop from about 60 kilobytes per frame to 20, and garbage collection cycles became about a third to a fourth rarer.

The average update time dropped by about 13%, while the median update time dropped by about 5%. The average improved by more than the median because it’s more impacted by the rare but significant garbage collection cycles. If Jeff Bezos walks in a bar, the average wealth of patrons would immediately jump by a billion dollars, but if you sorted the patrons from richest to poorest, the patron standing in the middle would still likely be about as wealthy.

Finally, the standard deviation dropped by about half. This again indicates that rarer garbage collections result in more frame time consistency.

I’m especially pleased by the fact that I saw a real performance improvement even though the graphics system now enqueues multiple command objects per widget, instead of just one closure per widget. This suggests that multiple pooled objects win out over even a single temporary allocation. I may well see even bigger improvements in cases where the switch to pooled objects is more one-to-one.

Footnote

Under the previous model, the anonymous draw function would execute during the engine’s draw phase. If you wanted to perform any other tasks at that time, like messing with the render backend’s transform state or such, you could do so in that method.

Since the new model enqueues limited draw commands, that opportunity’s no longer available. To account for that possibility, I added a method called iui.draw.enqueue to let you execute arbitrary code during the draw phase, after previously issued draw commands, but before subsequent commands.

Yes, this means that I’m introducing temporary allocations again, after having just got rid of them, but this will be much rarer than before: currently, none of the built-in widgets make use of this functionality at all.

iui.draw.enqueue(function()
    -- Adjust the transform state
    
    -- Perform raw graphics rendering
    
    -- Unwind the change in transform
end)