[Chapel Merge] Restore some 'inline' markings to IO.chpl

Branch: refs/heads/master
Revision: 7050c26
Author: bradcray
Log Message:

Merge pull request #17349 from bradcray/restore-some-io-inlines

Restore some 'inline' markings to IO.chpl

[reviewed by @ronawho]

This restores some of the 'inline' markings to IO.chpl that were
removed in PR #16887. They were removed hoping that they would speed
up compilation, but turned out not to have that large of an impact, at
least for things we measure, or in aggregate, in our testing system.
However, they did have a somewhat adverse effect on one of our fasta
implementations, which is IO-heavy, which gave us pause about the
implications for other IO-heavy programs.

This PR restores some, but not all, of those 'inline's for the following
reasons:

  • restores them to lock() and unlock() because for unlocked
    files, the amount of code in those routines is minimal, and
    these proved to be on the critical path in Elliot's experiments

  • restored them for read() and write() because these are the
    powerhouses of I/O, and most of the heavy lifting is done
    in per-argument subroutines, so we don't expect them to hurt
    much.

  • restored them for the _to*() overloads on enums because what little
    complexity was in those routines is a param conditional, so they're
    really no more complex than the others

We thought about restoring all of them just to return to the status
quo, but I left them off the following routines:

  • left them off of mark/offset because these don't seem obviously
    deeply time-critical to me and seem nontrivial

  • left them off of _read_binary_internal/_write_binary_internal/
    _read_one_internal/_write_one_internal because these seem like
    precisely the type of routine that we'd want to stamp out once
    per type being read/written, and they're not obviously trivial.

  • left them off of readbits and chpl_do_format for similar reasons.

Number of lines in generated code *.c for a --fast compile of fasta.chpl:

        caseyb/fasta   arkouda
master:    31682       894627
this:      31990       908974
% increase   <1%         1.6%

As a note if people wanted to mess with this more in the future:

  • We noted that we could potentially create clones of routines (like lock/unlock)
    where we used where clauses to inline in the no-locking case because the
    routines are virtually no-ops there; but to not inline in the locking case where
    things are going to be more expensive anyway
  • We also noted that for --no-local compilations where many of these
    routines contain on-clauses, the inlining will really only be inlining the arg
    bundlings, since the on-clauses will already be pushed out into their own
    functions (for --local compilations, those on-clauses are dropped on the
    floor).

Resolves https://github.com/Cray/chapel-private/issues/1787

Modified Files:
M modules/standard/IO.chpl

Compare: https://github.com/chapel-lang/chapel/compare/81371c98836f...7050c26f6d16