Separates the Vulkan feature sets
`VkPhysicalDeviceShaderFloat16Int8Features` and
`VkPhysicalDevice16BitStorageFeatures`, which previously were used
"together, or not at all".
This commit should not change any behavior yet, but I'd like to run full
CI tests on it for now. If the CI tests pass, I'll use this separation
to enable the `shader_int8` feature separately from the rest of the
features to enable optimizations of `[un]pack4x{I,U}8[Clamp]` on SPIR-V.
Implements more direct conversions between 32-bit integers and 4x8-bit
integer vectors using bit casting to/from `packed_[u]char4` when on
MSL 2.1+ (older versions of MSL don't seem to support these bit casts).
- `unpack4x{I, U}8(x)` becomes `[u]int4(as_type<packed_[u]char4>(x))`;
- `pack4x{I, U}8(x)` becomes `as_type<uint>(packed_[u]char4(x))`; and
- `pack4x{I, U}8Clamp(x)` becomes
`as_type<uint>(packed_uchar4(metal::clamp(x, 0, 255)))`.
These bit casts match the WGSL spec for these functions because Metal
runs on little-endian machines.
Emits vectorized SPIR-V code for the WGSL functions `unpack4xI8`,
`unpack4xU8`, `pack4xI8`, `pack4xU8`, `pack4xI8Clamp`, and
`pack4xU8Clamp` if `Capability::Int8` is available.
Exploits the following facts about SPIR-V ops:
- `SClamp`, `UClamp`, and `OpUConvert` accept vector arguments, in which
case results are computed per component; and
- `OpBitcast` can cast between vectors and scalars, with a well-defined
bit order that matches that required by the WGSL spec, see below.
WGSL spec for `pack4xI8` [1]:
> Component e[i] of the input is mapped to bits 8 x i through 8 x i + 7
> of the result.
SPIR-V spec for `OpBitcast` [2]:
> Within this mapping, any single component of `S` [remark: the type
> with fewer but wider components] (mapping to multiple components of
> `L` [remark: the type with more but narrower components]) maps its
> lower-ordered bits to the lower-numbered components of `L`.
[1] https://www.w3.org/TR/WGSL/#pack4xI8-builtin
[2] https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast
Includes the following minor functional fixes to deno_webgpu:
* Don't throw an error immediately when `create_buffer` is called with invalid usage flags.
* Implement `on_submitted_work_done`.
* Correct validation of GPUExtent3D element count.
* Run without tracing (instead of panic) if the DENO_WEBGPU_TRACE env var is not set.
Fixes#6838
* Potentially optimize `dot4{I,U}8Packed` on Metal
This might allow the Metal compiler to emit faster code (but that's not
confirmed). See
<https://github.com/gpuweb/gpuweb/issues/2677#issuecomment-1713292226>
for the optimization. The limitation to Metal 2.1+ is discussed here:
<https://github.com/gfx-rs/wgpu/pull/7574#issuecomment-2835464472>.
* [naga] Factor out new part of `put_block` on msl
CI on test failed because the latest changes to `put_block` made its
stack too big. Factoring out the new code into a separate method fixes
this issue.
* Rely on `libm` for a `no_std` alternative to `round_ties_even`
Update comments around `no_std` CI task
* Update Cargo.toml
* Feedback
Co-Authored-By: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
---------
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
Changes the MSL and HLSL backends to support writing only a single entry
point, and uses them that way in wgpu-hal.
This is working towards a fix for #5885.
* Increase the limit in test_stack_size
Emit optimized code for `dot4{I, U}8Packed` regardless of SPIR-V version
as long as the required capabilities are available. On SPIR-V < 1.6,
require the extension "SPV_KHR_integer_dot_product" for this. On
SPIR-V >= 1.6, don't require the extension because the corresponding
capabilities are part of SPIR-V >= 1.6 proper.
When checking for capabilities in SPIR-V,
`capabilities_available == None` indicates that all capabilities are
available. However, some capabilities are not even defined for all
language versions, so we still need to check if the requested
capabilities even exist in the language version we're using.
It used to be that `wgpu::Buffer` did not know its own size, and so slices had to potentially not know their endpoints. Now, buffers do, so slices can. This makes the code simpler, without modifying the API.
* [naga glsl-out] Differentiate between support for `std140` and `std430` layout, and emit `std140` in Uniforms when possible
* [naga glsl-out] Remove storage buffer std140 layout fallback, and error when we are unable to assign an explicit memory layout for uniform and storage globals
Co-authored-by: teoxoy <28601907+teoxoy@users.noreply.github.com>
---------
Co-authored-by: teoxoy <28601907+teoxoy@users.noreply.github.com>
The `metal` crate is currently unsound regarding unknown/future enum
variants, see https://github.com/gfx-rs/metal-rs/issues/209 and
https://github.com/rust-lang/rfcs/pull/3803.
`objc2-metal` fixes this by emitting C enums as a newtype + constants
for each variant, but that prevents us from importing the
variants/constants. So this commit converts to a pattern that works with
that in preparation for the migration.
A lot of Metal types are prefixed with MTL, which makes it quite clear
where they're coming from. This means that we don't loose any clarity if
we import them instead of having them prefixed with `metal::`.
This will make it easier to migrate to `objc2-metal` since that crate is
named differently from the `metal` crate.
The old is_compatible_with handled scalar/scalar, scalar/vector, vector/vector, but was missing vector/scalar.
Since is_compatible_with is only used by vertex shader inputs, and vertex shader inputs can't be matrices (only scalars and vectors), we can actually simplify this by removing the other match and just only checking the kind.
Fixes#7568