New Issue: can the comm counts from CommDiagnostics include cache hit/miss information?

16884, “mppf”, “can the comm counts from CommDiagnostics include cache hit/miss information?”, “2020-12-18T14:33:13Z”

Since we are planning to enable --cache-remote by default, it would be useful to make information about how effectively it is performing available along with the CommDiagnostics counters we already have.

Although the cache is per-pthread, that can be viewed as an implementation detail. In this issue we are proposing having per-locale atomic counters. This will make average program behavior easy to see and is similar to how we already handle GET counts in CommDiagnostics.

Further, this issue is suggesting that the additional cache counters be enabled by the CommDiagnostics counting (just as we count GETs). The chpl_commDiagnostics record which is returned by getCommDiagnostics will be extended to include new fields.

These counters are very useful for discovering how effective the cache is - especially for GETs.

I think we could add the following fields to record chpl_commDiagnostics:

  • For GETs
    • the number of GET requests attempted through the cache
    • the number of “hits” where these requests did not require any comms
      • note that a request spanning multiple pages can have some pages be a hit and others be a miss; in that case we would call it a miss for the purposes of the counters
  • For PUTs
    • the number of PUT requests attempted through the cache
    • the number of “hits” where the request did not require any additional comms
      • for PUTs, we would count it as a “hit” when writing to a page that already has dirty bits. That is because these often don’t require additional comms beyond the comm needed to for the pending write represented by the dirty bit. (This is a simplification because it assumes that the dirty regions are adjacent to each other and can be combined. It would be possible to compute if the new data is adjacent to existing dirty data).
  • For cache fences
    • the number of cache fences (could separate acquire/release fences if we wanted to)
  • For prefetches
    • the number of explicitly requested prefetches
    • the number of automatic prefetches invoked by the cache