Branches generated during vector operations

damianmoz · May 2, 2022, 4:15am

Consider the following code snippet (all the vectors are actually of type real(w)).

    inline proc dot(f, g, p, q) // p and q are scalar, the rest are vectors
    {
        // this algebra should be written using 3 FMAs to minimize errors
        // by effectively doing the calculations at additional precision!

        return f * p + g * q;
    }
    inline proc visc(r, m, l, t, s, p, q) // same comment as for dot above
    {
        return dot((r - m) + (l - m), (t - m) + (s - m), p, q);
    }
    inline proc conv(f, g, c, m, l, t, s, p, q) // same comment as for dot above
    {
        return dot((t - s) * ((t + s) + (m + m)), c - (l + m) * (f + g), p, q);
    }
    .......
    // whole lot of declarations
    ....
    forall i in 2..Nx-1 do
    {
        const jm1 = 1..Ny-2, j = 2..Ny-1, jp1 = 3..Ny, jp2 = 4..Ny+1, ip1 = i+1;
        const ref uni = un[i, ..], unip1 = un[ip1, ..];
        const ref vni = vn[i, ..], vnip1 = vn[ip1, ..];
        const ref ur = unip1[jp1], um = unip1[j], uij = uni[j];
        const ref vr = vnip1[jp1], vm = vni[jp1], vij = vni[j];

        const uvrm = (um + ur) * (vm + vr);

        // in X
        {
            const ref ul = unip1[jm1], vip1j = vnip1[j], ui2j = un[i+2, j];
            ref vx = VISCx[ip1, j], cx = CONVx[ip1, j];

            vx = visc(ur, um, ul, ui2j, uij, rdy2, rdx2);
            cx = conv(vip1j, vij, uvrm, um, ul, ui2j, uij, rqdx, rqdy);
            ustar[ip1, j] = um + dt * (damp * vx - cx);
        }
        // in Y
        {
            const ref vl = vn[i-1, jp1], uijp1 = uni[jp1], vij2 = vni[jp2];
            ref vy = VISCy[i, jp1], cy = CONVy[i, jp1];

            vy = visc(vr, vm, vl, vij2, vij, rdx2, rdy2);
            cy = conv(uijp1, uij, uvrm, vm, vl, vij2, vij, rqdy, rqdx);
            vstar[i, jp1] = vm + dt * (damp * vy - cy);
        }
    }

Within that inner loop, there are about 41 vector operations, all of length Ny-2. They can be written in scalar form to live within a single for or foreach loop which would have a single branch.

But, in the form above, where the algebra is written in terms of vector operations, how many branches will be created? Is it still just one?

Thanks.

Topic		Replies	Views
Slicing with rank-reduction performance Users	4	245	May 10, 2022
Werid performance slow-down Users	3	177	January 12, 2023
LinearAlgebra module in 1.24 Users	8	313	May 4, 2021
Calling C for vector operations Users	2	199	November 9, 2021
Minimum of Two Real numbers Users	9	189	August 9, 2022

Branches generated during vector operations

Related Topics