As much as I like our design, I understand that what runs where is a bit confusing, and choosing the right words to describe things can be difficult. Where the overarching rule is, if you are executing on a GPU sublocale, e.g. with something like on here.gpus[0]
, only GPU-eligible things will execute on the GPU.
Here are some examples to help:
// you can't tell whether a function will execute on the GPU or CPU
// by just looking at its definition
proc foo() {
var x: int;
forall ...
}
// we just started executing on the CPU, nothing will magically end up on the GPU.
// foo will be invoked by the CPU
// the forall inside foo will also execute on the CPU even if
// it was GPU eligible.
foo();
on here.gpus[0] {
// now we are executing on the GPU _sublocale_
// that doesn't necessarily mean everything inside this block will
// be executed by the GPU
var x: int; // this is a scalar, it will _not_ be on the GPU
var r: myRecord; // this is a record value, it will _not_ be on the GPU
var t: 3*int; // ditto, records and tuples are similar. _not_ on the GPU
var c: MyClass; // classes are different -- this will be allocated on the GPU
// but still will be accessible by the CPU with some magic
var Arr: [1..n] int; // arrays will be allocated on the GPU
// the CPU can access it through communication under the hood
// it will not be great for performance if you use this on CPU
for... // for loop is sequential -- executed on the CPU
forall ... // if GPU-eligible, will turn into a kernel
Arr = 3; // this is a whole-array operation. Under the hood, this is actually a forall
// this will execute on the GPU
writeln(Arr); // you can do this, it will execute on the CPU because it is IO.
// elements will be read one-by-one with communication (i.e. cudaMemcpy)
foo(); // the actual function call will be executed by the CPU.
// all the rules I outlined above now applies to the body of foo
// e.g. "var x" within the body will be on CPU memory
// e.g. the forall within the body will execute as a kernel
// (the same loop can execute both on the CPU and GPU depending on the context)
}