* Make wgpu-core's copy_buffer_to_buffer `size` parameter optional
* Make the copy size optional in more places
* Fix for webgpu backend
* [deno_webgpu] Support additional copyBufferToBuffer signatures
* Add changelog entry
* Add copyBufferToBuffer tests to CTS test list
(This doesn't actually enable the tests for the new overloads, because
of a different error reporting issue that affects many CTS tests
including these. But if you run the tests for the new overloads
manually, before and after the fix, you can see that the behavior has
changed.)
* Reproducible formula for vendoring modified webgpu-sys
Commit the updated vendor command in all the files for consistency.
Separates the Vulkan feature sets
`VkPhysicalDeviceShaderFloat16Int8Features` and
`VkPhysicalDevice16BitStorageFeatures`, which previously were used
"together, or not at all".
This commit should not change any behavior yet, but I'd like to run full
CI tests on it for now. If the CI tests pass, I'll use this separation
to enable the `shader_int8` feature separately from the rest of the
features to enable optimizations of `[un]pack4x{I,U}8[Clamp]` on SPIR-V.
Implements more direct conversions between 32-bit integers and 4x8-bit
integer vectors using bit casting to/from `packed_[u]char4` when on
MSL 2.1+ (older versions of MSL don't seem to support these bit casts).
- `unpack4x{I, U}8(x)` becomes `[u]int4(as_type<packed_[u]char4>(x))`;
- `pack4x{I, U}8(x)` becomes `as_type<uint>(packed_[u]char4(x))`; and
- `pack4x{I, U}8Clamp(x)` becomes
`as_type<uint>(packed_uchar4(metal::clamp(x, 0, 255)))`.
These bit casts match the WGSL spec for these functions because Metal
runs on little-endian machines.
Emits vectorized SPIR-V code for the WGSL functions `unpack4xI8`,
`unpack4xU8`, `pack4xI8`, `pack4xU8`, `pack4xI8Clamp`, and
`pack4xU8Clamp` if `Capability::Int8` is available.
Exploits the following facts about SPIR-V ops:
- `SClamp`, `UClamp`, and `OpUConvert` accept vector arguments, in which
case results are computed per component; and
- `OpBitcast` can cast between vectors and scalars, with a well-defined
bit order that matches that required by the WGSL spec, see below.
WGSL spec for `pack4xI8` [1]:
> Component e[i] of the input is mapped to bits 8 x i through 8 x i + 7
> of the result.
SPIR-V spec for `OpBitcast` [2]:
> Within this mapping, any single component of `S` [remark: the type
> with fewer but wider components] (mapping to multiple components of
> `L` [remark: the type with more but narrower components]) maps its
> lower-ordered bits to the lower-numbered components of `L`.
[1] https://www.w3.org/TR/WGSL/#pack4xI8-builtin
[2] https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast
Includes the following minor functional fixes to deno_webgpu:
* Don't throw an error immediately when `create_buffer` is called with invalid usage flags.
* Implement `on_submitted_work_done`.
* Correct validation of GPUExtent3D element count.
* Run without tracing (instead of panic) if the DENO_WEBGPU_TRACE env var is not set.
Fixes#6838
* Potentially optimize `dot4{I,U}8Packed` on Metal
This might allow the Metal compiler to emit faster code (but that's not
confirmed). See
<https://github.com/gpuweb/gpuweb/issues/2677#issuecomment-1713292226>
for the optimization. The limitation to Metal 2.1+ is discussed here:
<https://github.com/gfx-rs/wgpu/pull/7574#issuecomment-2835464472>.
* [naga] Factor out new part of `put_block` on msl
CI on test failed because the latest changes to `put_block` made its
stack too big. Factoring out the new code into a separate method fixes
this issue.
* Rely on `libm` for a `no_std` alternative to `round_ties_even`
Update comments around `no_std` CI task
* Update Cargo.toml
* Feedback
Co-Authored-By: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
---------
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
Changes the MSL and HLSL backends to support writing only a single entry
point, and uses them that way in wgpu-hal.
This is working towards a fix for #5885.
* Increase the limit in test_stack_size
Emit optimized code for `dot4{I, U}8Packed` regardless of SPIR-V version
as long as the required capabilities are available. On SPIR-V < 1.6,
require the extension "SPV_KHR_integer_dot_product" for this. On
SPIR-V >= 1.6, don't require the extension because the corresponding
capabilities are part of SPIR-V >= 1.6 proper.
When checking for capabilities in SPIR-V,
`capabilities_available == None` indicates that all capabilities are
available. However, some capabilities are not even defined for all
language versions, so we still need to check if the requested
capabilities even exist in the language version we're using.
It used to be that `wgpu::Buffer` did not know its own size, and so slices had to potentially not know their endpoints. Now, buffers do, so slices can. This makes the code simpler, without modifying the API.