After a busy few months, I am back at this blog. This week we will take a deep dive into the GPU memory management in my small game engine. The system is in no way, shape or form perfect for AAA usage. However, I think it is quite cool, simple and easily scalable to such levels.

Gaming industry has thrived even when the device had very low memory, and efficient memory management has always been a major point of interest throughout the years. And many memory management techniques, such as resource streaming, were inevitable considering our passion behind making the perfect system for a perfectly smooth game experience.
Normally, user applications tend to rely on operating system (or an underlying system) to handle the memory, but for games we always see engines write their own management logic. We do this because that gives us an additional control on the data and since we usually know how & when it is needed, we can come up with faster, simpler and sometimes creative solutions that can allow our games to perform better.
Even though in this new generation of game development, where developers have stopped thinking too much about memory management strategies and only need to care about learning one of the many ready made game engines available out there, these systems are always being used internally allowing a good memory usage (provided that the developers know what they are doing) and managing GPU Memory is no different. One can even argue that GPU memory management is even easier to manage compared to CPU memory because here we are usually only dealing with only two types of resources i.e buffers and textures. But I say it depends on the game’s need and how you want the GPU to treat those buffers and textures.
So that brings us to what were my needs from this engine. Well… it is not easily explainable, but to put it simply…my goal from this engine is to allow me to tinker with any rendering topic (including memory management) without everything falling apart and needing a complete refactor. So I try to go for the simplest solution. But due to my itch for designing the perfect system and writing the most awesome code to ever exist XD, I can not just leave it at “simple”. So, the goal is to engineer it enough that it fits well with whatever, but it is simple enough to understand and change whenever, yet it somewhat resembles what modern game engines do, because afterall this exercise was also for me to understand the concept.
So to understand the memory management for GPUs, we first start with what is available at the lowest level. And since my engine is built using D3D12 api and this system is also written around the same, lets talk about the ID3D12Heap.
ID3D12Heap interface (d3d12.h) A heap is an abstraction of contiguous memory allocation, used to manage physical memory. This heap can be used with ID3D12Resource objects to support placed resources or reserved resources. Inheritance The ID3D12Heap interface inherits from ID3D12Pageable.
Here is what MSDN has to say about it. It says that it is a contiguous memory allocation from the physical memory. But this is just how it appears above the abstraction. Internally it is much more than this.
Referring from a post on AMD GPU Open: When developing a graphics application using Direct3D 12 for a PC with a discrete graphics card, we work with 2 types of memory: system RAM located on the motherboard and video RAM (VRAM) located on the graphics card. The main processor (CPU) has fast and direct/local access to the system RAM, while the graphics processor (GPU) has fast and direct access to the VRAM. GPU has access to the system RAM via the PCIe bus but the communication is significantly slower.
On integrated GPUs, there is no discrete VRAM. Instead, the GPU and CPU share system memory. The memory is divided into regions, some of which are GPU optimized with different caching policies, but it’s all fundamentally the same physical RAM. This is why on integrated graphics, the distinction between default heaps and upload heaps becomes more about cache coherency and access patterns than physical memory location.
MSDN also mentions that the ID3D12Heap interface inherits from ID3D12Pageable. ID3D12Pageable is the base interface for anything that actually occupies GPU memory and can be moved by window’s display driver module. Both ID3D12Heap and ID3D12Resource (which we will discuss later) inherit the it. The key insight here is that Windows maintains a GPU memory manager at the OS level, sitting above the driver. This memory manager needs to track what’s consuming VRAM and make decisions about what stays resident when memory gets tight. This is where memory management differs in PC and consoles. Consoles do not have anything happening behind the scenes leaving more control on the developers. But since PC can have multiple applications running simultaneously, OS needs to provide virtually exclusive access to the GPU to all the apps. So using this table the OS evicts a few pageable resources based on when it was last accessed. You can read more about Residency here.
In essence, communication between system RAM and VRAM needs to go through the PCIe bus. In the Direct3D 12 API, there are multiple ways to perform such a data upload. And it allows creating 4 types of heaps which is specified in the HeapDesc in D3D12_HEAP_PROPERTIES.
Since the UPLOAD heap is GPU visible as well, we can let GPU directly access the contents in shader.
In d3d12, we use the ID3D12Device::CreateHeap method to create a heap that can be used with placed and reserved resources. This internally initiates a very long sequence of operations across multiple layers of the graphics stack.
First, the application code calls the CreateHeap() with a D3D12_HEAP_DESC struct. This structure provides:
UINT64 SizeInBytes;
D3D12_HEAP_PROPERTIES Properties;
UINT64 Alignment;
D3D12_HEAP_FLAGS Flags;
This call goes into the D3D12 runtime, which is a Microsoft-provided DLL that sits between the application and the driver.
The runtime performs validations such as the size is reasonable or the flags are compatible and if the validation layers are enabled, then it gives an extensive validation providing warnings about suboptimal usage patterns as well. The runtime maintains its own tracking structures for all D3D12 objects you create, which helps with debugging and validation.
Once validated, the call passes down to the user-mode driver, which is vendor-specific code. This is where things get really interesting because the driver has deep knowledge about the actual GPU hardware and its capabilities. The driver examines the heap request in context of the current memory situation. It checks how much VRAM is currently available, what else is allocated, and whether the request can be satisfied immediately. The driver maintains its own memory manager that tracks allocations at a finer granularity than the application sees. Heap allocation might be carved out of a larger internal allocation, or it might require the driver to request new memory from the kernel.
The driver then issues a call down to the kernel-mode driver, which runs the lowest level with full system privileges. The kernel driver is responsible for actually talking to the GPU hardware and managing the physical memory resources. It’s the kernel driver that programs the GPU’s memory management unit to set up the page tables that map virtual GPU addresses to physical memory locations.
Modern GPUs can also use virtual memory much like CPUs do. When a heap is created, we get a virtual GPU address space. The GPU’s MMU translates these virtual addresses to physical addresses in VRAM or system RAM. This virtualization is what allows the OS to page memory in and out, because it can change the page table mappings without the GPU’s shader programs needing to know that addresses have been remapped.

Read more about the GPU MMU model here.
So as you can see, creation of heap and switching the active heap are quite heavy operations. So we create and bind one heap for the entire application or game during the loading and then we manage the resource placement internally.
Creating a placed resource within a heap using CreatePlacedResource, carves out a sub region of the heap’s virtual address space and giving it structure. The driver doesn’t allocate new memory because the memory already exists in the heap. Instead, it creates a resource descriptor that defines how to interpret that region of memory.
The driver calculates the actual GPU virtual address for the resource by taking the heap’s base virtual address and adding the requested offset. It validates that the resource fits within the heap boundaries and that alignment requirements are met. Textures need 64KB alignment because of how the GPU’s texture units are designed and how they manage memory access patterns for efficient filtering.
Now that we have a decent idea about how the memory works internally, we can take a look at how we can manage the resource placement.
ID3D12Heap* holds the Heap object and here is a simple wrapper over the ID3D12Heap that creates the heap in the constructor by calling the ID3D12Device::CreateHeap and destroys it in the destructor.
/*
* Wrapper over ID3D12Heap
*/
class Heap
{
public:
static constexpr uint64_t PLACEMENT_ALIGNMENT = D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT;
struct Description
{
std::string name;
size_t sizeInBytes;
HeapType type;
};
// Creates and initializes an ID3D12Heap
Heap(Device& device, const Description& createDescription);
// Frees the heap.
// Since mHeap is a ComPtr it gets released automatically when the Heap object is destroyed.
~Heap();
public:
// Returns the raw pointer of ID3D12Heap.
inline const ID3D12Heap* get() const { return mHeap.Get(); }
inline ID3D12Heap* get() { return mHeap.Get(); }
public:
static D3D12_HEAP_PROPERTIES toD3DHeapProperties(const Description& description);
private:
Microsoft::WRL::ComPtr<ID3D12Heap> mHeap;
};
When dealing with GPU resources, we generally categorize them into a few categories and we section our heap for each one of them. The core idea behind it is that the categories are not only based on the resource type, but we also categorize them based on residency ie how long do we plan to keep them in memory. And we do this to allow us to use different allocation strategies per categories to get minimum fragmentation.
Here is a possible way to section the heap into.
Using all these, we can write a new class which includes our heap wrapper that looks something like this. Where we make sections using a simple linear allocator and choose an appropriate allocator based on the resource category provided in the ResourceDescription. With some helper function overloads to create all the different type of GPU resources
/*
* A resource heap with a custom allocator.
* It allocates the resource based on the specified ResourceCategory.
* It contains all different type of heaps ie Upload, Default and ReadBack.
*/
class HeapWithAllocator
{
public:
enum ResourceCategory
{
UploadData,
UploadPersistentData,
TransientData,
ConstantsData,
MeshData,
RenderTargetData,
TextureData,
PersistentData,
COUNT
};
struct ResourceDescription
{
ResourceCategory category;
GPUResource::Description resource;
};
public:
HeapWithAllocator(Device& device);
ByteAddressBuffer createByteAddressBuffer(const ResourceDescription& createDescription);
StructuredBuffer createStructuredBuffer(const ResourceDescription& createDescription, const unsigned int elementCount, const unsigned int strideInBytes);
TypedBuffer createTypedBuffer(const ResourceDescription& createDescription, const unsigned int elementCount, const unsigned int strideInBytes);
Texture createTexture(const ResourceDescription& createDescription);
RenderTarget createRenderTarget(Texture& texture, const RenderTarget::Type type = RenderTarget::Type::Color);
void deallocateBuffer(const ByteAddressBuffer& buffer, const ResourceCategory category);
void deallocateBuffer(const StructuredBuffer& buffer, const ResourceCategory category);
void deallocateBuffer(const TypedBuffer& buffer, const ResourceCategory category);
void deallocateTexture(const Texture& texture, const ResourceCategory category);
inline void reset(const ResourceCategory category) { mAllocator.reset(category); }
~HeapWithAllocator() = default;
private:
Device& mDevice;
private:
/*
* Custom allocator using the basic allocators from Core/Memory
* Linear allocator is used to virtually segment the heap and a different allocator is used based on the resource category.
*/
class ResourceAllocator
{
public:
// Total memory budget of GPU memory.
// This includes the budget of Upload heap and Readback heap as well.
// 2 GB TOTAL.
// 128 MB for upload heap.
// 1664 MB for default heap.
// 256 MB extra.
static constexpr std::array<size_t, ResourceCategory::COUNT> MEMORY_BUDGET = {
memory::MB(64), // Category::Upload
memory::MB(64), // Category::UploadPersistent
memory::MB(64), // Category::Constants
memory::MB(64), // Category::Transient
memory::MB(256), // Category::Mesh
memory::MB(128), // Category::RenderTarget
memory::MB(640), // Category::Texture
memory::MB(256) // Category::Persistent
};
public:
ResourceAllocator()
:
mUploadHeapSections(memory::MB(128)),
mUploadData(MEMORY_BUDGET[ResourceCategory::UploadData], mUploadHeapSections.allocate(MEMORY_BUDGET[ResourceCategory::UploadData])),
mUploadPersistentData(MEMORY_BUDGET[ResourceCategory::UploadPersistentData], mUploadHeapSections.allocate(MEMORY_BUDGET[ResourceCategory::UploadPersistentData])),
mDefaultHeapSections(memory::MB(1664)),
mConstants(MEMORY_BUDGET[ResourceCategory::ConstantsData], 256, mDefaultHeapSections.allocate(MEMORY_BUDGET[ResourceCategory::ConstantsData])),
mTransientData(MEMORY_BUDGET[ResourceCategory::TransientData], mDefaultHeapSections.allocate(MEMORY_BUDGET[ResourceCategory::TransientData])),
mMesh(MEMORY_BUDGET[ResourceCategory::MeshData], 128, mDefaultHeapSections.allocate(MEMORY_BUDGET[ResourceCategory::MeshData])),
mRenderTarget(MEMORY_BUDGET[ResourceCategory::RenderTargetData], mDefaultHeapSections.allocate(MEMORY_BUDGET[ResourceCategory::RenderTargetData])),
mTexture(MEMORY_BUDGET[ResourceCategory::TextureData], mDefaultHeapSections.allocate(MEMORY_BUDGET[ResourceCategory::TextureData])),
mPersistentData(MEMORY_BUDGET[ResourceCategory::PersistentData], mDefaultHeapSections.allocate(MEMORY_BUDGET[ResourceCategory::PersistentData]))
{
}
// Allocates the sizeInBytes and returns an allocation offset for the specified category.
inline uint64_t allocate(const ResourceCategory category, const size_t sizeInBytes)
{
std::lock_guard<std::mutex> scopedLockGuard(mMutex[category]);
switch (category)
{
default:
case ResourceCategory::COUNT:
HASSERT(false, "Invalid resource category ", category);
break;
case ResourceCategory::UploadData:
return mUploadData.allocate(sizeInBytes);
case ResourceCategory::UploadPersistentData:
return mUploadPersistentData.allocate(sizeInBytes);
case ResourceCategory::ConstantsData:
return mConstants.allocate(sizeInBytes);
case ResourceCategory::TransientData:
return mTransientData.allocate(sizeInBytes);
case ResourceCategory::MeshData:
return mMesh.allocate(sizeInBytes);
case ResourceCategory::RenderTargetData:
return mRenderTarget.allocate(sizeInBytes);
case ResourceCategory::TextureData:
return mTexture.allocate(sizeInBytes);
case ResourceCategory::PersistentData:
return mPersistentData.allocate(sizeInBytes);
}
}
// Provides a method to deallocate a specific offset but since ring allocators reuse the memory, no deallocation happens for ring.
inline void deallocate(const ResourceCategory category, const uint64_t allocationOffset, const size_t sizeInBytes)
{
std::lock_guard<std::mutex> scopedLockGuard(mMutex[category]);
switch (category)
{
default:
case ResourceCategory::COUNT:
case ResourceCategory::UploadData:
case ResourceCategory::RenderTargetData:
case ResourceCategory::PersistentData:
case ResourceCategory::TransientData:
//HASSERT_SOFT_ONCE(false, "Invalid resource category %d. No deallocation will happen.", category);
break;
case ResourceCategory::UploadPersistentData:
mUploadPersistentData.deallocate(allocationOffset, sizeInBytes);
break;
case ResourceCategory::ConstantsData:
mConstants.deallocate(allocationOffset, sizeInBytes);
break;
case ResourceCategory::MeshData:
mMesh.deallocate(allocationOffset, sizeInBytes);
break;
case ResourceCategory::TextureData:
mTexture.deallocate(allocationOffset, sizeInBytes);
break;
}
}
// Resets the allocator for the specified ResourceCategory.
inline void reset(const ResourceCategory category)
{
std::lock_guard<std::mutex> scopedLockGuard(mMutex[category]);
switch (category)
{
default:
case ResourceCategory::COUNT:
HASSERT(false, "Invalid resource category ", category);
break;
case ResourceCategory::UploadData:
mUploadData.reset();
break;
case ResourceCategory::UploadPersistentData:
mUploadPersistentData.reset();
break;
case ResourceCategory::ConstantsData:
mConstants.reset();
break;
case ResourceCategory::TransientData:
mTransientData.reset();
break;
case ResourceCategory::MeshData:
mMesh.reset();
break;
case ResourceCategory::RenderTargetData:
mRenderTarget.reset();
break;
case ResourceCategory::TextureData:
mTexture.reset();
break;
case ResourceCategory::PersistentData:
mPersistentData.reset();
break;
}
}
~ResourceAllocator() = default;
private:
memory::LinearAllocator mUploadHeapSections;
memory::RingAllocator mUploadData;
memory::SuballocationAllocator mUploadPersistentData;
memory::LinearAllocator mDefaultHeapSections;
memory::BlockAllocator mConstants;
memory::RingAllocator mTransientData;
memory::BlockAllocator mMesh;
memory::RingAllocator mRenderTarget;
memory::SuballocationAllocator mTexture;
memory::LinearAllocator mPersistentData;
std::array<std::mutex, ResourceCategory::COUNT> mMutex;
};
ResourceAllocator mAllocator;
private:
Heap mUploadHeap;
Heap mDefaultHeap;
// Heap mReadbackHeap;
inline Heap& getBackingHeap(const ResourceCategory category)
{
switch (category)
{
case ResourceCategory::UploadData:
case ResourceCategory::UploadPersistentData:
return mUploadHeap;
case ResourceCategory::TransientData:
case ResourceCategory::ConstantsData:
case ResourceCategory::MeshData:
case ResourceCategory::RenderTargetData:
case ResourceCategory::TextureData:
case ResourceCategory::PersistentData:
case ResourceCategory::COUNT:
default:
return mDefaultHeap;
}
}
};
A descriptor in D3D12 is a small, well-defined block of data that lives in GPU accessible memory and tells the GPU hardware directly how to interpret and access a resource. We can think of a descriptor as a view of a resource. Same resource can have different views which specify a separate way to interpret the data.
Which is a good reason to keep them separate from the actual GPU resources and allowing the users to create them on demand. But when it comes to managing them it is no different than ID3D12Heap. In fact, it gets easier here because descriptors are a fixed size structs so when it comes to managing the offset, it is quite simple and can be done using a simple ring allocator.
However, when we pair it with resource binding to the shaders, the process gets more intricate which we will take a look in the next blog.
But just for memory management for descriptors, it can be kept same as the resource heap with 4 separate descriptor heaps for ShaderViewsCPU, ShaderViewsGPU, RenderTargetCPU and DepthStencilCPU.
We have done a deep dive into the GPU’s memory hierarchy, created a small wrapper over the ID3D12Heap and defined a clear way to categorize the resources so that an appropriate allocation strategy can be chosen. The descriptor’s memory management is also the same.
If you have any suggestions or comments and you like this post or you want to learn more, lets connect on twitter @JayNakum_ or linkedin @JayNakum.
© 2026 Jay Nakum. All rights reserved.
Any direct or indirect use of all content requires prior written permission. No AI training allowed.