Robin Malfait 737994b7aa
Allow _ before numbers during candidate extraction (#17961)
This PR fixes a bug where a class like `header_1` wasn't properly
extracted because we didn't allow an `_` before a number.

This PR fixes that by allowing an `_` before a number.

Fixes: #17960


## Test plan

1. Added a test to verify this works
2. Existing tests work

Used the visualizer tool for this to verify that the `header_1` class is
being extracted:
<img width="1816" alt="image"
src="https://github.com/user-attachments/assets/fdc21602-0e2b-4e4e-92a1-19c4f4f5393f"
/>
2025-05-09 16:29:28 +00:00
..
2025-05-09 13:39:18 +00:00