Resource Heaps

Creating a Heap
Sub-Allocating Heap Resources

Creating a Fence
Tracking Fences Across Blit and Compute Command Encoders
Tracking Fences Across Render Command Encoders
Fence Examples

Separate Heaps for Render Target Types
Separate Heaps for Aliasable and Non-Aliasable Resources
Separate Heaps to Reduce Fragmentation
Minimize Fencing
Consider Tracking Non-Heap Resources

Available in: iOS_GPUFamily1_v3, iOS_GPUFamily2_v3, iOS_GPUFamily3_v2, tvOS_GPUFamily1_v2

Resource heaps 使得 Metal resource 在 backed 使用相同的 memory allocation。这些资源从memory pool heap 中创建，然后tracked by fence capture 和管理 GPU dependencies。Resource heap 帮助你的app减少以下消耗：

资源创建。资源创建包括：分配新内存，映射到进程上，填充0。这些消耗可以通过以下手段降低：从一个大的 heap 中创建资源，或者使用 heap 中可循环利用的 memory。
Fixed memory budget。如果一些资源有一段时间没有使用了，virtual memory 可能会 compress这些资源所占用的内存来节省空间。这样会导致额外的时间消耗，去讲这些资源内存 available Again 供下次使用。通过使用少量的heaps，可以令你将内存分配保持在一个固定的内存 budget上，并确保这些资源持续在使用（这样也可以帮助保持性能消耗的一致性）
Transient resources。Transient resources 每帧都会创建和使用，但是并非所有的资源都会在同一时间使用。为了减少内存消耗，没有在同一时间使用的 transient resource 可以通过 heap 使用同一块内存

Heaps

MTLHeap 是一个 Metal resource 对应一个虚拟的内存池。通过 heap 创建的资源分为 aliasable or non-aliasable。aliased 的 sub-allocated 资源与其它 aliased 资源共享同一块heap memory。

Creating a Heap

MTLHeap 是通过 MTLDevice 的 newHeapWithDescriptor: 方法创建。MTLHeapDescriptor 定义了 storage mode、CPU cache mode、以及 heap 的尺寸，以bytes为单位。同一个 heap 分配出来的所有 sub-allocaed 资源共享相同的 storage mode、CPU cache mode。heap 的尺寸必须足够大，给它对应的资源分配内存

heap 被创建后，可以通过 setPurgeableState: 方法使其变成 purgeable 。heap 的 purgeability state 适用于它的所有 backing memory，并影响它关联的所有 resource。Heaps 是 purgeable，但是它的资源并非如此，sub-allocated resource 只能reflect heap 的purgeability state。对于只包含 rt 的 heap，Purgeability 还是挺有用的

Sub-Allocating Heap Resources

MTLBuffer 和 MTLTexture 都可以 sub-allocaed from a heap。方法是调用 MTLHeap 的如下函数

newBufferWithLength:options:
newTextureWithDescriptor:

每个 sub-allocaed resource 都默认是 non-aliasable，以防后面 sub-allocaed resource 使用它的内存。可以通过 makeAliasable 方法将 sub-allocaed resource 设置为 aliasable，这样的话，之后 sub-allocaed resource 就可以重用这块内存。

Aliasable sub-allocated 资源并非被销毁，依然可以被 command encoder 使用。这些资源强引用 heap，只有当这些资源本身被销毁的时候才会被释放，而非被标记成 aliasable的时候。sub-allocaed resource 只有在所有使用它们的command buffer 完全结束后，才能被销毁。

注意：Heap 线程安全，但是依然需要在app level 同步 heaps来确保 aliasing 按照期望被设置。sub-allocaed resource之间的 command dependencies 并非自动的，需要通过MTLFence显式的手动track。

下面代码展示了如何使用heap进行简单的sub-allocation

    	// Calculate the size and alignment of each resource
		MTLSizeAndAlign albedoSizeAndAlign = [_device heapTextureSizeAndAlignWithTextureDescriptor:_albedoDescriptor];
		MTLSizeAndAlign normalSizeAndAlign = [_device heapTextureSizeAndAlignWithTextureDescriptor:_normalDescriptor];
		MTLSizeAndAlign glossSizeAndAlign  = [_device heapTextureSizeAndAlignWithTextureDescriptor:_glossDescriptor];
		 
		// Calculate a heap size that satisfies the size requirements of all three resources
		NSUInteger heapSize = albedoSizeAndAlign.size + normalSizeAndAlign.size + glossSizeAndAlign.size;
		 
		// Create a heap descriptor
		MTLHeapDescriptor* heapDescriptor = [MTLHeapDescriptor new];
		heapDescriptor.cpuCacheMode = MTLCPUCacheModeDefaultCache;
		heapDescriptor.storageMode = MTLStorageModePrivate;
		heapDescriptor.size = heapSize;
		 
		// Create a heap
		id <.MTLHeap> heap = [_device newHeapWithDescriptor:heapDescriptor];
		 
		// Create sub-allocated resources from the heap
		id <.MTLTexture> albedoTexture = [_heap newTextureWithDescriptor:_albedoDescriptor];
		id <.MTLTexture> normalTexture = [_heap newTextureWithDescriptor:_normalDescriptor];
		id <.MTLTexture> glossTexture  = [_heap newTextureWithDescriptor:_glossDescriptor];å

Fences

MTLFence 用于 track 和 manage sub-allocaed resource dependencies across commmand encoder。Resource dependencies 发生在资源被不同的命令生成和使用的时候，不管这些命令会被 encode 到同一个 queue还是不同的queue。 fence capture GPU work 到一个特定点。当GPU 遇到 fence 的时候，必须等待所有 capture 的 work 完成，才能继续工作。

Creating a Fence

通过 MTLDevice 的 newFence 来创建一个 MTLFence object。fence 被用于 trace 使用，只支持 GPU 中的 trace，不支持 CPU 和 GPU 之间的 track。 MTLFence 不提供任何方法或者 completion handler，唯一能改的只有 label 属性

注意：fence 可以重复update，硬件负责管理 fence update，以防死锁。

Tracking Fences Across Blit and Compute Command Encoders

MTLBlitCommandEncoder 和 MTLComputeCommandEncoder 可以被 fence track。使用 updateFence 方法来 update fence。调用 waitForFence 方法来 wait for a fence

fence 在 command buffer 被实际 submit 到 hardware 的时候被 update 或者 evaluated。这样可以保持全局顺序，并预防死锁。

驱动可能会在 command encoder 开始的时候 wait on fences，也可能会delay 到 command encoder 结束的时候 fence update。所以，你不可以在同一个command encoder 中先 update，然后 wait 同一个 fence。（但是，可以先 wait 然后 update）。producer-consumer 关系必须分开在不同的 command encoder中

Tracking Fences Across Render Command Encoders

MTLRenderCommandEncoder 可以以一个更精细的力度来通过 fence track，MTLRenderStages 使得你可以指定 render stage 用于 fence 的update 或者 wait，允许 vs 和 ps 重叠执行。调用 updateFence:afterStages: 方法来 update fence，调用 waitForFence:beforeStages: 方法来wait fence。

Fence Examples

下面展示了fence的一个基本使用

    	id <.MTLFence> fence = [_device newFence];
		id <.MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
		 
		// Producer
		id <.MTLRenderCommandEncoder> renderCommandEncoder = [commandBuffer renderCommandEncoderWithDescriptor:_descriptor];
		/* Draw using resources associated with 'fence' */
		[renderCommandEncoder updateFence:fence afterStages:MTLRenderStageFragment];
		[renderCommandEncoder endEncoding];
		 
		// Consumer
		id <.MTLComputeCommandEncoder> computeCommandEncoder = [commandBuffer computeCommandEncoder];
		[computeCommandEncoder waitForFence:fence];
		/* Dispatch using resources associated with 'fence' */
		[computeCommandEncoder endEncoding];
		 
		[commandBuffer commit];

假如有2个 command encoder，你不能只在后面一个 command encoder 设置一个 update fence来假设两个 command encoder 都能完成。consumer command encoder 必须显式的等待所有会发生冲突的 command encoder 的fence。（GPU可能尽可能多的执行 command，除非遇到了一个 fence）。下列代码为fence 的错误用法，会导致 race condition

    	id <.MTLFence> fence = [_device newFence];
		id <.MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
		 
		// Producer 1
		id <.MTLRenderCommandEncoder> producerCommandEncoder1 = [commandBuffer renderCommandEncoderWithDescriptor:_descriptor];
		/* Draw using resources associated with 'fence' */
		[producerCommandEncoder1 endEncoding];
		 
		// Producer 2
		id <.MTLComputeCommandEncoder> producerCommandEncoder2 = [commandBuffer computeCommandEncoder];
		/* Encode */
		[producerCommandEncoder2 updateFence:fence];
		[producerCommandEncoder2 endEncoding];
		 
		// Race condition at consumption!
		// producerCommandEncoder2 updated the fence and will have completed its work
		// producerCommandEncoder1 did not update the fence and therefore there is no guarantee that it will have completed its work
		// Consumer
		id <.MTLComputeCommandEncoder> computeCommandEncoder = [commandBuffer computeCommandEncoder];
		[computeCommandEncoder waitForFence:fence];
		/* Dispatch using resources associated with 'fence' */
		[computeCommandEncoder endEncoding];
		 
		[commandBuffer commit];

针对不同 command queue 中不同 command buffer 的同步，你依然需要对 command buffer 的 submission 进行排序，如下列代码所示。然而 fence 不允许你控制 inter-queue command buffer 的排序

    	id <.MTLFence> fence = [_device newFence];
		id <.MTLCommandBuffer> commandBuffer0 = [_commandQueue0 commandBuffer];
		id <.MTLCommandBuffer> commandBuffer1 = [_commandQueue1 commandBuffer];
		 
		// Producer
		id <.MTLRenderCommandEncoder> renderCommandEncoder = [commandBuffer0 renderCommandEncoderWithDescriptor:_descriptor];
		/* Draw using resources associated with 'fence' */
		[renderCommandEncoder updateFence:fence afterStages:MTLRenderStageFragment];
		[renderCommandEncoder endEncoding];
		 
		// Consumer
		id <.MTLComputeCommandEncoder> computeCommandEncoder = [commandBuffer1 computeCommandEncoder];
		[computeCommandEncoder waitForFence:fence];
		/* Dispatch using resources associated with 'fence' */
		[computeCommandEncoder endEncoding];
		 
		// Ensure 'commandBuffer0' is scheduled before 'commandBuffer1'
		[commandBuffer0 addScheduledHandler:^(id <.MTLCommandBuffer>) {
		    [commandBuffer1 commit];
		}];
		[commandBuffer0 commit];

Best Practices

Separate Heaps for Render Target Types

一些设备不支持任意的 alias sub-allocated resouces，比如 compressible depth texture 、 mass texture。最好根据不同类型的 rt 创建不同的 heap ： color, depth, stencil, and MSAA

Separate Heaps for Aliasable and Non-Aliasable Resources

当把一个 sub-allocated resource 设置为 aliasable 的时候，必须假设这个resource 会被后面所有的 heap sub-allocation alias。如果后面分配了一个 non-aliasable resource，比如一个长期存在的texture，这些资源可能会 alias 你的 temporary 资源，变得非常难 track。

track aliases 和 non-aliases 的时候，最好分成至少2个resource heap会明显简单很多：一个用于 aliasable resource（比如 rt），一个用于 non-aliasable resource （比如资源的 texture 、vertex buffer）

Separate Heaps to Reduce Fragmentation

创建或者删除许多不同尺寸的 sub-allocated 资源会导致内存碎片化。碎片整理需要显示的从碎片heap 复制到另一个 heap。或者可以创建多个 heap，用于类似尺寸的 sub-allocated resource

heap 也可以使用 stack，当用作 stack 的时候，不会出现碎片化。

Minimize Fencing

细粒度的fence很难管理，并且会降低 heap 的track benefit。避免针对每个sub-allocated resouce使用一个 fence，instead，使用一个fence 去 track 所有具有相同同步要求的 sub-allocated resource

Consider Tracking Non-Heap Resources

可以考虑针对 MTLDevice 直接创建的资源使用手动 track。在创建资源的时候指定使用 MTLResourceHazardTrackingModeUntracked resource option，然后用 fence 来 track。手动 track 可以降低许多只读资源的自动 track 带来的开销

Sample Code

关于如何使用 heap 合 fence，参考 MetalHeapsAndFences sample。

本节教程就到此结束,希望大家继续阅读我之后的教程。

谢谢大家,再见!

原创技术文章，撰写不易，转载请注明出处：电子设备中的画家|王烁于 2022 年 4 月 27 日发表，原文链接(http://geekfaner.com/shineengine/blog58_MetalProgrammingGuide_13.html)