mirror of
https://github.com/tailwindlabs/tailwindcss.git
synced 2025-12-08 21:36:08 +00:00
2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d0a97467f4
|
Improve boundary classification (#17005)
This PR cleans up the boundary character checking by using similar
classification techniques as we used for other classification problems.
For starters, this moves the boundary related items to its own file,
next we setup the classification enum.
Last but not least, we removed `}` as an _after_ boundary character, and
instead handle that situation in the Ruby pre processor where we need
it. This means the `%w{flex}` will still work in Ruby files.
---
This PR is a followup for
https://github.com/tailwindlabs/tailwindcss/pull/17001, the main goal is
to clean up some of the boundary character checking code. The other big
improvement is performance. Changing the boundary character checking to
use a classification instead results in:
Took the best score of 10 runs each:
```diff
- CandidateMachine: Throughput: 311.96 MB/s
+ CandidateMachine: Throughput: 333.52 MB/s
```
So a ~20MB/s improvement.
# Test plan
1. Existing tests should pass. Due to the removal of `}` as an after
boundary character, some tests are updated.
2. Added new tests to ensure the Ruby pre processor still works as
expected.
---------
Co-authored-by: Jordan Pittman <jordan@cryptica.me>
|
||
|
|
4c110014f1
|
Improve internal DX around byte classification [1] (#16864)
This PR improves the internal DX when working with `u8` classification
into a smaller enum. This is done by implementing a `ClassifyBytes` proc
derive macro. The benefit of this is that the DX is much better and
everything you will see here is done at compile time.
Before:
```rs
#[derive(Debug, Clone, Copy, PartialEq)]
enum Class {
ValidStart,
ValidInside,
OpenBracket,
OpenParen,
Slash,
Other,
}
const CLASS_TABLE: [Class; 256] = {
let mut table = [Class::Other; 256];
macro_rules! set {
($class:expr, $($byte:expr),+ $(,)?) => {
$(table[$byte as usize] = $class;)+
};
}
macro_rules! set_range {
($class:expr, $start:literal ..= $end:literal) => {
let mut i = $start;
while i <= $end {
table[i as usize] = $class;
i += 1;
}
};
}
set_range!(Class::ValidStart, b'a'..=b'z');
set_range!(Class::ValidStart, b'A'..=b'Z');
set_range!(Class::ValidStart, b'0'..=b'9');
set!(Class::OpenBracket, b'[');
set!(Class::OpenParen, b'(');
set!(Class::Slash, b'/');
set!(Class::ValidInside, b'-', b'_', b'.');
table
};
```
After:
```rs
#[derive(Debug, Clone, Copy, PartialEq, ClassifyBytes)]
enum Class {
#[bytes_range(b'a'..=b'z', b'A'..=b'Z', b'0'..=b'9')]
ValidStart,
#[bytes(b'-', b'_', b'.')]
ValidInside,
#[bytes(b'[')]
OpenBracket,
#[bytes(b'(')]
OpenParen,
#[bytes(b'/')]
Slash,
#[fallback]
Other,
}
```
Before we were generating a `CLASS_TABLE` that we could access directly,
but now it will be part of the `Class`. This means that the usage has to
change:
```diff
- CLASS_TABLE[cursor.curr as usize]
+ Class::TABLE[cursor.curr as usize]
```
This is slightly worse UX, and this is where another change comes in. We
implemented the `From<u8> for #enum_name` trait inside of the
`ClassifyBytes` derive macro. This allows us to use `.into()` on any
`u8` as long as we are comparing it to a `Class` instance. In our
scenario:
```diff
- Class::TABLE[cursor.curr as usize]
+ cursor.curr.into()
```
Usage wise, this looks something like this:
```diff
while cursor.pos < len {
- match Class::TABLE[cursor.curr as usize] {
+ match cursor.curr.into() {
- Class::Escape => match Class::Table[cursor.next as usize] {
+ Class::Escape => match cursor.next.into() {
// An escaped whitespace character is not allowed
Class::Whitespace => return MachineState::Idle,
// An escaped character, skip ahead to the next character
_ => cursor.advance(),
},
// End of the string
Class::Quote if cursor.curr == end_char => return self.done(start_pos, cursor),
// Any kind of whitespace is not allowed
Class::Whitespace => return MachineState::Idle,
// Everything else is valid
_ => {}
};
cursor.advance()
}
MachineState::Idle
}
}
```
If you manually look at the `Class::TABLE` in your editor for example,
you can see that it is properly generated at compile time.
Given this input:
```rs
#[derive(Clone, Copy, ClassifyBytes)]
enum Class {
#[bytes_range(b'a'..=b'z')]
AlphaLower,
#[bytes_range(b'A'..=b'Z')]
AlphaUpper,
#[bytes(b'@')]
At,
#[bytes(b':')]
Colon,
#[bytes(b'-')]
Dash,
#[bytes(b'.')]
Dot,
#[bytes(b'\0')]
End,
#[bytes(b'!')]
Exclamation,
#[bytes_range(b'0'..=b'9')]
Number,
#[bytes(b'[')]
OpenBracket,
#[bytes(b']')]
CloseBracket,
#[bytes(b'(')]
OpenParen,
#[bytes(b'%')]
Percent,
#[bytes(b'"', b'\'', b'`')]
Quote,
#[bytes(b'/')]
Slash,
#[bytes(b'_')]
Underscore,
#[bytes(b' ', b'\t', b'\n', b'\r', b'\x0C')]
Whitespace,
#[fallback]
Other,
}
```
This is the result:
<img width="1244" alt="image"
src="https://github.com/user-attachments/assets/6ffd6ad3-0b2f-4381-a24c-593e4c72080e"
/>
|