gpuClock() only returns 0

e-kayrakli · April 1, 2025, 5:49pm

As much as I like our design, I understand that what runs where is a bit confusing, and choosing the right words to describe things can be difficult. Where the overarching rule is, if you are executing on a GPU sublocale, e.g. with something like on here.gpus[0], only GPU-eligible things will execute on the GPU.

Here are some examples to help:

// you can't tell whether a function will execute on the GPU or CPU
// by just looking at its definition
proc foo() {
  var x: int;
  forall ...
}

// we just started executing on the CPU, nothing will magically end up on the GPU.

// foo will be invoked by the CPU
// the forall inside foo will also execute on the CPU even if
// it was GPU eligible.
foo(); 

on here.gpus[0] {
  // now we are executing on the GPU  _sublocale_
  // that doesn't necessarily mean everything inside this block will 
  // be executed by the GPU

  var x: int; // this is a scalar, it will _not_ be on the GPU
  var r: myRecord; // this is a record value, it will _not_ be on the GPU
  var t: 3*int; // ditto, records and tuples are similar. _not_ on the GPU
  var c: MyClass; // classes are different -- this will be allocated on the GPU
                  // but still will be accessible by the CPU with some magic
  var Arr: [1..n] int; // arrays will be allocated on the GPU
                       // the CPU can access it through communication under the hood
                       // it will not be great for performance if you use this on CPU

  for... // for loop is sequential -- executed on the CPU
  
  forall ... // if GPU-eligible, will turn into a kernel

  Arr = 3;  // this is a whole-array operation. Under the hood, this is actually a forall
            // this will execute on the GPU

  writeln(Arr);  // you can do this, it will execute on the CPU because it is IO.
                 // elements will be read one-by-one with communication (i.e. cudaMemcpy)

  foo();  // the actual function call will be executed by the CPU.
          // all the rules I outlined above now applies to the body of foo
          // e.g. "var x" within the body will be on CPU memory
          // e.g. the forall within the body will execute as a kernel
          // (the same loop can execute both on the CPU and GPU depending on the context)
}

gpuClock() only returns 0

Related topics