P010 is a 4:2:0 chroma subsampled planar format, similar to NV12. Each
component uses 16 bits of storage, of which only the high 10 bits are
used. On DX12 this maps to DXGI_FORMAT_P010, and on Vulkan this maps to
G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16.
The existing "nv12" gpu test module has been renamed to
"planar_texture", and a new test P010_TEXTURE_CREATION_SAMPLING has
been added similar to the existing NV12_TEXTURE_CREATION_SAMPLING. The
remaining tests in this module have been converted to validation tests,
and now test both NV12 and P010 formats.
This implements the DX12 HAL part of external texture support, which is
the final piece of the puzzle for external textures on DirectX 12.
When creating a pipeline layout, HAL is responsible for mapping a
single BindingType::ExternalTexture bind group entry to multiple
descriptor ranges in the root signature it creates: 3 SRVs (one for
each texture plane) and a CBV for the parameters buffer. Additionally
we must expose the additional bindings to the Naga backend via the
`external_texture_binding_map`. Lastly, when creating a bind group we
write the descriptors for each of these bindings to the heap.
And with that, we can finally enable the `EXTERNAL_TEXTURE` feature
for the dx12 backend.
This adds HLSL backend support for `ImageClass::External` (ie WGSL's
`external_texture` texture type).
For each external texture global variable in the IR, we declare 3
`Texture2D` globals as well as a `cbuffer` for the params. The
additional bindings required by these are found in the newly added
`external_texture_binding_map`. Unique names for each can be obtained
using `NameKey::ExternalTextureGlobalVariable`.
For functions that contain ImageQuery::Size, ImageLoad, or ImageSample
expressions for external textures, ensure we have generated wrapper
functions for those expressions. When emitting code for the
expressions themselves, simply insert a call to the wrapper function.
For size queries, we return the value provided in the params
struct. If that value is [0, 0] then we query the size of the plane 0
texture and return that.
For load and sample, we sample the textures based on the number of
planes specified in the params struct. If there is more than one plane
we additionally perform YUV to RGB conversion using the provided
matrix.
Unfortunately HLSL does not allow structs to contain textures, meaning
we are unable to wrap the 3 textures and params struct variables in a
single variable that can be passed around.
For our wrapper functions we therefore ensure they take the three
textures and the params as consecutive arguments. Likewise, when
declaring user-defined functions with external texture arguments, we
expand the single external texture argument into 4 consecutive
arguments. (Using NameKey::ExternalTextureFunctionArgument to ensure
unique names for each.)
Thankfully external textures can only be used as either global
variables or function arguments. This means we only have to handle the
`Expression::GlobalVariable` and `Expression::FunctionArgument` cases
of `write_expr()`. Since in both cases we know the external texture
can only be an argument to either a user-defined function or one of
our wrapper functions, we can simply emit the names of the variables
for each three textures and the params struct in a comma-separated
list.
Adds a `BindingResource` variant for external textures. In core's
create_bind_group() implementation, allow binding either external
textures or texture views to `BindingType::ExternalTexture` layout
entries.
In either case, provide HAL with a `hal::ExternalTextureBinding`,
consisting of 3 `hal::TextureBinding`s and a `hal::BufferBinding`. In
the texture view case we use the device's default params buffer for
the buffer. When there are fewer than 3 planes we can simply repeat an
existing plane multiple times - the contents of the params buffer will
ensure the shader only accesses the correct number of planes anyway.
Track the view or external texture in `BindGroupStates` to ensure they
remain alive whilst required.
And finally, add the corresponding API to wgpu, with an implementation
for the wgpu-core backend.
Adds a new feature flag, `EXTERNAL_TEXTURE`, indicating device support
for our implementation of WebGPU's `GPUExternalTexture` [1] which will
land in upcoming patches. Conceptually this would make more sense as a
downlevel flag, as it is a core part of the WebGPU spec which we do not
yet support. We do not want, however, to cause applications to reject
adapters because we have not finished implementing this, so for now we
are making it an opt-in feature.
As an initial step towards supporting this feature, this patch adds a
new `BindingType` corresponding to WebGPU's
`GPUExternalTextureBindingLayout` [2]. This binding type dictates that
when creating a bind group the corresponding entry must be either an
external texture or a texture view with certain additional requirements
[3].
As of yet wgpu has no concept of an external texture (that will follow
in later patches) but for now this patch ensures that texture views
corresponding to an external texture binding type are validated
correctly. Note that as the feature flag is not yet supported on any
real backends, bind group layout creation will fail before getting the
chance to attempt to create a bind group. But in the added tests using
the noop backend we can see this validation taking place.
[1] https://www.w3.org/TR/webgpu/#gpuexternaltexture
[1] https://www.w3.org/TR/webgpu/#dictdef-gpuexternaltexturebindinglayout
[2] https://gpuweb.github.io/gpuweb/#bind-group-creation
Some versions of WARP return an allocation info with SizeInBytes == 0
for very large allocations. Proceeding to attempt to allocate a zero-
sized resource results in a device lost error. It's preferable to
instead return an out of memory error.
Adds a mode to compaction that removes unused functions, global
variables, and named types and overrides. This mode is used
everywhere except the compaction at the end of lowering, where
it is important to preserve unused items for type checking and
other validation of the module.
Pruning all but the active entry point and then compacting makes
`process_overrides` tolerant of missing values for overrides that are
not used by the active entry point.
Fixes#5885