19076, "e-kayrakli", "Rename regexMatch fields to emphasize that they are byte-based", "2022-01-23T21:41:36Z"
The regexMatch
record has two fields: offset and size.
They represent the byte offset of a match from the beginning of the string buffer, and the size of the match -- again in bytes. We should rename these to make it clear that they are byte-based, otherwise, they can be confused with other string fields/procs:
use Regex;
var r = compile("rkç");
var s = "Türkçe";
var match = r.search(s);
writeln(match.size); // 4: because `size` is byte-based
writeln(s[match].size); // 3: because `size` is codepoint-based
I propose we rename
-
size
asnumBytes
: String and bytes types already havenumBytes
fields. -
offset
asbyteOffset
: Seems clear enough, but arguably this is a bit more open for discussion.