[Chapel Merge] improve readBytesOrString especially for readAll

chapelhpcBot · April 11, 2024, 2:58pm

Branch: refs/heads/main
Revision: 1b2bdd661e46946a56cf45dd7d2ff5070edc2b14
Author: mppf
Link: improve readBytesOrString especially for readAll by mppf · Pull Request #24783 · chapel-lang/chapel · GitHub
Log Message:
improve readBytesOrString especially for readAll (#24783)

This PR improves the performance of I/O routines that were calling
readBytesOrString : readString, readBytes, readAll, and
readBinary(string|bytes). This PR is motivated by wanting to improve
the performance of the regexredux2 and regexredux3 CLBG benchmarks which
use readAll. We identified that the readAll call was slower than
similar code in C++.

The approach here is to factor readBytesOrString into readBytesImpl
and readStringImpl.

readBytesImpl uses repeated calls to qio_channel_read while doubling
the size of the buffer as needed to store more data in it. It uses the
initial file size (which is already computed in the I/O library and just
needed a function to get at it) to compute an initial buffer size. This
avoids several kinds of overhead and forms an alternative implementation
of #19270.

readStringImpl is similar, but it reads a codepoint at a time, in
order to preserve the end offset when invalid UTF-8 is encountered (see
PR #24807).

This PR also resolves #24781 by adjusting the documentation for
string.createAdoptingBuffer to clarify that the buffer is not freed
when an error is thrown.

This PR removes some special case code that was making readString use
maxSize in bytes, rather than in codepoints, when the undocumented
binary mode. This makes readString more consistent.

This PR changes the behavior on error for the readString,
readBinary(string/bytes), and readAll functions. It changes them so
that, on an error such as invalid UTF-8, the channel position is left
near the error. The alternative behavior of returning to the start of
the read on such an error would require extra memory / buffering and
reduce performance. Anyway, the offset after trying to decode invalid
UTF-8 is now tested in
test/io/ferguson/utf8/various-reads-invalid-utf8.chpl.

Lastly, this PR updates the documentation for a number of read*
functions in order to clarify where the channel position is left on an
error.

How does it affect performance? These measurements were collected on my
PC with start_test -performance test/studies/shootout/submitted/regexredux*.chpl --numtrials 4 (and
note that main here includes #24757). Please note that the performance
gain for readAll itself will be a larger percentage (since it is not
most of these benchmarks).

Benchmark main this PR speedup

regexredux2 1.55 s 1.45 s 7% faster

regexredux3 1.29 s 1.17 s 10% faster

Reviewed by @vasslitvinov and @jeremiah-corrado - thanks!

full comm=none testing

Compare: Comparing f0a23dcee339d77394be5cc3df196b3667295f31...3f53a2c8b881602cfaf722b2b8c896ff590071b7 · chapel-lang/chapel · GitHub

Diff:
M modules/internal/String.chpl
M modules/standard/IO.chpl
M runtime/include/qio/qio.h
M test/io/ferguson/utf8/readstring-stdin-binary-utf8.chpl
M test/io/ferguson/utf8/readstring-stdin-binary-utf8.good
M test/io/ferguson/utf8/readstring-stdin-text-utf8.chpl
M test/io/ferguson/utf8/readstring-stdin-text-utf8.good
A test/io/ferguson/utf8/various-reads-invalid-utf8.chpl
A test/io/ferguson/utf8/various-reads-invalid-utf8.good
A test/library/standard/IO/readAll/readStringInvalid.chpl
A test/library/standard/IO/readAll/readStringInvalid.good
A test/library/standard/IO/readStringAndBytes/checkAllocationSize.chpl
A test/library/standard/IO/readStringAndBytes/checkAllocationSize.good
A test/library/standard/IO/readStringAndBytes/moby.txt
M test/library/standard/IO/readStringAndBytes/readBytes.chpl
M test/library/standard/IO/readStringAndBytes/readString.chpl
A test/library/standard/IO/readStringAndBytes/readStringUnicode.chpl
A test/library/standard/IO/readStringAndBytes/readStringUnicode.good
A test/library/standard/IO/readStringAndBytes/snowmen.txt
https://github.com/chapel-lang/chapel/pull/24783.diff

Benchmark	main	this PR	speedup
regexredux2	1.55 s	1.45 s	7% faster
regexredux3	1.29 s	1.17 s	10% faster

[Chapel Merge] improve readBytesOrString especially for readAll

Related Topics