Branch: refs/heads/master
Revision: 7050c26
Author: bradcray
Log Message:
Merge pull request #17349 from bradcray/restore-some-io-inlines
Restore some 'inline' markings to IO.chpl
[reviewed by @ronawho]
This restores some of the 'inline' markings to IO.chpl that were
removed in PR #16887. They were removed hoping that they would speed
up compilation, but turned out not to have that large of an impact, at
least for things we measure, or in aggregate, in our testing system.
However, they did have a somewhat adverse effect on one of our fasta
implementations, which is IO-heavy, which gave us pause about the
implications for other IO-heavy programs.
This PR restores some, but not all, of those 'inline's for the following
reasons:
-
restores them to lock() and unlock() because for unlocked
files, the amount of code in those routines is minimal, and
these proved to be on the critical path in Elliot's experiments -
restored them for read() and write() because these are the
powerhouses of I/O, and most of the heavy lifting is done
in per-argument subroutines, so we don't expect them to hurt
much. -
restored them for the _to*() overloads on enums because what little
complexity was in those routines is a param conditional, so they're
really no more complex than the others
We thought about restoring all of them just to return to the status
quo, but I left them off the following routines:
-
left them off of mark/offset because these don't seem obviously
deeply time-critical to me and seem nontrivial -
left them off of _read_binary_internal/_write_binary_internal/
_read_one_internal/_write_one_internal because these seem like
precisely the type of routine that we'd want to stamp out once
per type being read/written, and they're not obviously trivial. -
left them off of readbits and chpl_do_format for similar reasons.
Number of lines in generated code *.c for a --fast compile of fasta.chpl:
caseyb/fasta arkouda
master: 31682 894627
this: 31990 908974
% increase <1% 1.6%
As a note if people wanted to mess with this more in the future:
- We noted that we could potentially create clones of routines (like lock/unlock)
where we usedwhere
clauses to inline in the no-locking case because the
routines are virtually no-ops there; but to not inline in the locking case where
things are going to be more expensive anyway - We also noted that for
--no-local
compilations where many of these
routines contain on-clauses, the inlining will really only be inlining the arg
bundlings, since the on-clauses will already be pushed out into their own
functions (for--local
compilations, those on-clauses are dropped on the
floor).
Resolves https://github.com/Cray/chapel-private/issues/1787
Modified Files:
M modules/standard/IO.chpl
Compare: https://github.com/chapel-lang/chapel/compare/81371c98836f...7050c26f6d16