When deriving a bind group layout for a pipeline, bother to enforce
`wgpu_types::Limits::max_bindings_per_bind_group`. Move the check:
- from `wgpu_core::device::bgl::EntryMap::from_entries`, which is only used
for explicit bind group creation
- into `wgpu_core::device::Device::create_bind_group_layout`, which is used
for all bind group layout creation.
In the future `transfer` will want to use the `Arc` versions of these
types. I have exported them from `wgpu_core::command::ffi`, to document
that these are used for FFI purposes, and from `wgpu_core::command`, for
backwards compatibility, although I also change the in-tree uses to
use them from `wgpu_types` instead of from `wgpu_core::command`.
It is emptied by `reset_queries` at the end of every render pass, so
it's just keeping an allocation alive, not holding any state. It seems
unlikely that there is sufficient performance gain from reusing the
memory allocation to justify the complexity of additional state at
higher layers.
Previously, the check was skipped if the copy was a single row, which is
not correct. The check should be made whenever bytes_per_row is
specified. It is permissible not to specify bytes_per_row if the copy is
a single row, but if it is specified, it must be aligned.
Also removes a redundant check of the `offset` alignment.
Since the offset and bytesPerRow alignment checks are not part of
"validating linear texture data", I chose to remove that instance of
them. These checks are now in `validate_texture_buffer_copy`, which
does not correspond 1:1 with the spec, but has a comment explaining how
it does correspond.
TransferError now has separate variants for texture copy formats that
are only forbidden in combination with specific aspects
(CopyFrom/ToForbiddenTextureFormatAspect), and texture copy formats that
are always forbidden, irrespective of the aspect
(CopyFrom/ToForbiddenTextureFormat).
This produces a less confusing error message by not mentioning the
aspect it is not relevant.
Although the operation of these functions is defined in terms of f16
semantics, the input/output types are not f16, and they are generally
available even when native `f16` support is not. But in at least one
case, they are only available with `f16` support, so add a new downlevel
flag that is cleared when these functions are not available.
Add some infrastructure to simplify testing of missing
capabilities/extensions, and add tests for a few more kinds of f16
usage.
Co-authored-by: Erich Gubler <erichdongubler@gmail.com>
This contains the Metal HAL changes required to support external
textures. When creating a bind group we create resource bindings for
each of the 3 textures and parameters buffer that the external texture
has been lowered to. When creating the pipeline layout we fill the
`BindTarget` accordingly, so that the Naga MSL backend can bind each
of the global variables to which the the external texture has been
lowered to each of these resources.
We must also ensure the size of the buffer bound to the parameters
global matches the size of the MSL type, else metal validation
complains. We do this by adding a padding field to the rust-side
ExternalTextureParams struct, the size of which is used as the size of
the buffer to allocate.
Lastly we enable `Features::EXTERNAL_TEXTURE` on the Metal backend.
P010 is a 4:2:0 chroma subsampled planar format, similar to NV12. Each
component uses 16 bits of storage, of which only the high 10 bits are
used. On DX12 this maps to DXGI_FORMAT_P010, and on Vulkan this maps to
G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16.
The existing "nv12" gpu test module has been renamed to
"planar_texture", and a new test P010_TEXTURE_CREATION_SAMPLING has
been added similar to the existing NV12_TEXTURE_CREATION_SAMPLING. The
remaining tests in this module have been converted to validation tests,
and now test both NV12 and P010 formats.
The WebGPU spec. `createBindGroup` [states][spec-ref] (emphasis mine):
> Device timeline initialization steps:
>
> …
>
> 2. If any of the following conditions are unsatisfied generate a
> validation error, invalidate _bindGroup_ and return.
>
> …
>
> For each `GPUBindGroupEntry` _bindingDescriptor_ in
> _descriptor_.`entries`:
>
> - …
>
> - If the defined binding member for _layoutBinding_ is:
>
> - …
>
> - `buffer`
>
> - …
>
> - If _layoutBinding_.`buffer`.`type` is
>
> - …
>
> - `"storage"` or `"read-only-storage"`
>
> - …
>
> - effective buffer binding size(_bufferBinding_) is a multiple of 4.
[spec-ref]: https://www.w3.org/TR/webgpu/#dom-gpudevice-createbindgroup
We were not implementing this check of effective buffer binding size.
Check that it's a multiple of 4, including
`webgpu:api,validation,createBindGroup:buffer,effective_buffer_binding_size:*`
that this is now implemented as intended.
This adds several fields to `ExternalTextureDescriptor`, specifying
how to handle color space conversion for an external texture. These
fields consist of transfer functions for the source and destination
color spaces, and a matrix for converting between gamuts. This allows
`ImageSample` and `ImageLoad` operations on external textures to
return values in a desired destination color space rather than the
source color space of the underlying planes.
These fields are plumbed through to the `ExternalTextureParams`
uniform buffer from which they are exposed to the shader. Following
conversion from YUV to RGB after sampling/loading from the external
texture planes, the shader uses them to gamma decode to linear RGB in
the source color space, convert from source to destination gamut, then
finally gamma encode to non-linear RGB in the destination color space.
This adds HLSL backend support for `ImageClass::External` (ie WGSL's
`external_texture` texture type).
For each external texture global variable in the IR, we declare 3
`Texture2D` globals as well as a `cbuffer` for the params. The
additional bindings required by these are found in the newly added
`external_texture_binding_map`. Unique names for each can be obtained
using `NameKey::ExternalTextureGlobalVariable`.
For functions that contain ImageQuery::Size, ImageLoad, or ImageSample
expressions for external textures, ensure we have generated wrapper
functions for those expressions. When emitting code for the
expressions themselves, simply insert a call to the wrapper function.
For size queries, we return the value provided in the params
struct. If that value is [0, 0] then we query the size of the plane 0
texture and return that.
For load and sample, we sample the textures based on the number of
planes specified in the params struct. If there is more than one plane
we additionally perform YUV to RGB conversion using the provided
matrix.
Unfortunately HLSL does not allow structs to contain textures, meaning
we are unable to wrap the 3 textures and params struct variables in a
single variable that can be passed around.
For our wrapper functions we therefore ensure they take the three
textures and the params as consecutive arguments. Likewise, when
declaring user-defined functions with external texture arguments, we
expand the single external texture argument into 4 consecutive
arguments. (Using NameKey::ExternalTextureFunctionArgument to ensure
unique names for each.)
Thankfully external textures can only be used as either global
variables or function arguments. This means we only have to handle the
`Expression::GlobalVariable` and `Expression::FunctionArgument` cases
of `write_expr()`. Since in both cases we know the external texture
can only be an argument to either a user-defined function or one of
our wrapper functions, we can simply emit the names of the variables
for each three textures and the params struct in a comma-separated
list.
During wgsl lowering, if we encounter an external texture type then
generate the `ExternalTextureParams` struct. This will be required by
most Naga backends to implement external textures.
This type is not actually used by wgsl-in or the IR. However,
generating it in Naga IR ensures tricky details such as member
alignment are handled for us.
wgsl-out must ensure it does *not* generate code for this type, as it
handles external textures natively.
* Restore allowance of unaligned buffer-texture copies
This fixes a regression introduced by #7948. However, it makes it
possible to reach a panic in initialize_buffer_memory if the copy
requires initializing a region of memory that is not 4B aligned.
* Fix CopyT2T of multi-layer depth/stencil textures
* Adjust test list