tailwindcss/crates/classification-macros
Robin Malfait 4c110014f1
Improve internal DX around byte classification [1] (#16864)
This PR improves the internal DX when working with `u8` classification
into a smaller enum. This is done by implementing a `ClassifyBytes` proc
derive macro. The benefit of this is that the DX is much better and
everything you will see here is done at compile time.

Before:
```rs
#[derive(Debug, Clone, Copy, PartialEq)]
enum Class {
    ValidStart,
    ValidInside,
    OpenBracket,
    OpenParen,
    Slash,
    Other,
}

const CLASS_TABLE: [Class; 256] = {
    let mut table = [Class::Other; 256];

    macro_rules! set {
        ($class:expr, $($byte:expr),+ $(,)?) => {
            $(table[$byte as usize] = $class;)+
        };
    }

    macro_rules! set_range {
        ($class:expr, $start:literal ..= $end:literal) => {
            let mut i = $start;
            while i <= $end {
                table[i as usize] = $class;
                i += 1;
            }
        };
    }

    set_range!(Class::ValidStart, b'a'..=b'z');
    set_range!(Class::ValidStart, b'A'..=b'Z');
    set_range!(Class::ValidStart, b'0'..=b'9');

    set!(Class::OpenBracket, b'[');
    set!(Class::OpenParen, b'(');

    set!(Class::Slash, b'/');

    set!(Class::ValidInside, b'-', b'_', b'.');

    table
};
```

After:
```rs
#[derive(Debug, Clone, Copy, PartialEq, ClassifyBytes)]
enum Class {
    #[bytes_range(b'a'..=b'z', b'A'..=b'Z', b'0'..=b'9')]
    ValidStart,

    #[bytes(b'-', b'_', b'.')]
    ValidInside,

    #[bytes(b'[')]
    OpenBracket,

    #[bytes(b'(')]
    OpenParen,

    #[bytes(b'/')]
    Slash,

    #[fallback]
    Other,
}
```

Before we were generating a `CLASS_TABLE` that we could access directly,
but now it will be part of the `Class`. This means that the usage has to
change:

```diff
- CLASS_TABLE[cursor.curr as usize]
+ Class::TABLE[cursor.curr as usize]
```

This is slightly worse UX, and this is where another change comes in. We
implemented the `From<u8> for #enum_name` trait inside of the
`ClassifyBytes` derive macro. This allows us to use `.into()` on any
`u8` as long as we are comparing it to a `Class` instance. In our
scenario:

```diff
- Class::TABLE[cursor.curr as usize]
+ cursor.curr.into()
```

Usage wise, this looks something like this:
```diff
        while cursor.pos < len {
-           match Class::TABLE[cursor.curr as usize] {
+           match cursor.curr.into() {
-               Class::Escape => match Class::Table[cursor.next as usize] {
+               Class::Escape => match cursor.next.into() {
                    // An escaped whitespace character is not allowed
                    Class::Whitespace => return MachineState::Idle,

                    // An escaped character, skip ahead to the next character
                    _ => cursor.advance(),
                },

                // End of the string
                Class::Quote if cursor.curr == end_char => return self.done(start_pos, cursor),

                // Any kind of whitespace is not allowed
                Class::Whitespace => return MachineState::Idle,

                // Everything else is valid
                _ => {}
            };

            cursor.advance()
        }

        MachineState::Idle
    }
}
```


If you manually look at the `Class::TABLE` in your editor for example,
you can see that it is properly generated at compile time.

Given this input:
```rs
#[derive(Clone, Copy, ClassifyBytes)]
enum Class {
    #[bytes_range(b'a'..=b'z')]
    AlphaLower,

    #[bytes_range(b'A'..=b'Z')]
    AlphaUpper,

    #[bytes(b'@')]
    At,

    #[bytes(b':')]
    Colon,

    #[bytes(b'-')]
    Dash,

    #[bytes(b'.')]
    Dot,

    #[bytes(b'\0')]
    End,

    #[bytes(b'!')]
    Exclamation,

    #[bytes_range(b'0'..=b'9')]
    Number,

    #[bytes(b'[')]
    OpenBracket,

    #[bytes(b']')]
    CloseBracket,

    #[bytes(b'(')]
    OpenParen,

    #[bytes(b'%')]
    Percent,

    #[bytes(b'"', b'\'', b'`')]
    Quote,

    #[bytes(b'/')]
    Slash,

    #[bytes(b'_')]
    Underscore,

    #[bytes(b' ', b'\t', b'\n', b'\r', b'\x0C')]
    Whitespace,

    #[fallback]
    Other,
}
```

This is the result:
<img width="1244" alt="image"
src="https://github.com/user-attachments/assets/6ffd6ad3-0b2f-4381-a24c-593e4c72080e"
/>
2025-03-05 14:00:07 +01:00
..