protobuf.js

archive-gh-me/protobuf.js

Fork 0

mirror of https://github.com/protobufjs/protobuf.js.git synced 2025-12-08 20:58:55 +00:00

Commit Graph

Author	SHA1	Message	Date
Mason Clayton	75172cd11b	fix: utf8 -> utf16 decoding bug on surrogate pairs (#1486 ) * fix utf8 -> utf16 decoding bug on surrogate pairs This fixes https://github.com/protobufjs/protobuf.js/issues/1473 The custom utf8 -> utf16 decoder appears to be subtly flawed. From my reading it appears the chunking mechanism doesn't account for surrogate pairs at the end of a chunk causing variable size chunks. A larger chunk followed by a smaller chunk leaves behind garbage that'll be included in the latter chunk. It looks like the chunking mechanism was added to prevent stack overflows when calling `formCharCode` with too many args. From some benchmarking it appears putting utf16 code units in an array and spreading that into `fromCharCode` wasn't helping performance much anyway. I simplified it significantly. Here's a repro of the existing encoding bug in a fuzzing suite https://repl.it/@turbio/oh-no-our-strings#decoder.js * fix lint * add test case for surrogate pair bug Co-authored-by: Alexander Fenster <fenster@google.com>	2020-10-09 15:54:17 -07:00
dcodeIO	2b12fb7db9	Other: Moved micro modules to lib so they can have their own tests etc.; Fixed: Make sure to check optional inner messages for null when encoding, see #658	2017-01-25 04:39:43 +01:00

Author

SHA1

Message

Date

Mason Clayton

75172cd11b

fix: utf8 -> utf16 decoding bug on surrogate pairs (#1486 )

* fix utf8 -> utf16 decoding bug on surrogate pairs

This fixes https://github.com/protobufjs/protobuf.js/issues/1473

The custom utf8 -> utf16 decoder appears to be subtly flawed. From my reading it appears the chunking mechanism doesn't account for surrogate pairs at the end of a chunk causing variable size chunks. A larger chunk followed by a smaller chunk leaves behind garbage that'll be included in the latter chunk.

It looks like the chunking mechanism was added to prevent stack overflows when calling `formCharCode` with too many args. From some benchmarking it appears putting utf16 code units in an array and spreading that into `fromCharCode` wasn't helping performance much anyway. I simplified it significantly.

Here's a repro of the existing encoding bug in a fuzzing suite
https://repl.it/@turbio/oh-no-our-strings#decoder.js

* fix lint

* add test case for surrogate pair bug

Co-authored-by: Alexander Fenster <fenster@google.com>

2020-10-09 15:54:17 -07:00

dcodeIO

2b12fb7db9

Other: Moved micro modules to lib so they can have their own tests etc.; Fixed: Make sure to check optional inner messages for null when encoding, see #658

2017-01-25 04:39:43 +01:00

2 Commits