mirror of
https://github.com/protobufjs/protobuf.js.git
synced 2025-12-08 20:58:55 +00:00
* fix utf8 -> utf16 decoding bug on surrogate pairs This fixes https://github.com/protobufjs/protobuf.js/issues/1473 The custom utf8 -> utf16 decoder appears to be subtly flawed. From my reading it appears the chunking mechanism doesn't account for surrogate pairs at the end of a chunk causing variable size chunks. A larger chunk followed by a smaller chunk leaves behind garbage that'll be included in the latter chunk. It looks like the chunking mechanism was added to prevent stack overflows when calling `formCharCode` with too many args. From some benchmarking it appears putting utf16 code units in an array and spreading that into `fromCharCode` wasn't helping performance much anyway. I simplified it significantly. Here's a repro of the existing encoding bug in a fuzzing suite https://repl.it/@turbio/oh-no-our-strings#decoder.js * fix lint * add test case for surrogate pair bug Co-authored-by: Alexander Fenster <fenster@google.com>
@protobufjs/utf8
A minimal UTF8 implementation for number arrays.
API
-
utf8.length(string:
string):number
Calculates the UTF8 byte length of a string. -
utf8.read(buffer:
Uint8Array, start:number, end:number):string
Reads UTF8 bytes as a string. -
utf8.write(string:
string, buffer:Uint8Array, offset:number):number
Writes a string as UTF8 bytes.
License: BSD 3-Clause License