A '-S' flag and separate compilation

Will the Chapel compiler ever be able to support a '-S' flag, i.e. pass a Chapel file with one or more individual routines to the compiler and have it output the assembler?

Hi Damian,

We do not have plans for -S at the moment. Is this something you can do with a workaround?

I am thinking to pass --savec to Chapel then manually compile the generated C code. I would run 'make' by hand to see the compilation command, then modify that command.

Vass

I can endorse Vass' solution. It has worked well for me in the past.

Hi Damian —

I think that it's reasonable to expect that Chapel could and should
support a -S flag, though it has not historically been a priority.
Filing a feature request issue for it could be a first step to start to
change that, particularly if it's important / blocking.

The "and separate compilation" part is trickier since Chapel doesn't
support separate compilation at all today.

Note that the approach to this Vass and David are suggesting only work
when using the C back-end (CHPL_TARGET_COMPILER != llvm). If you were to
add --incremental to your command-line options, which causes each Chapel
module's '.c' file to be compiled separately, you could likely hone in on
a specific section of code more easily.

-Brad

Hi Vass,

I have tried to figure out a work around. Maybe I am just slow.

I compiled a small program

chpl --savec tmp s.chpl

and then I did

make -f tmp/Makefile

and the output was

rm -f s
mv tmp/s.tmp s

so not a lot of joy there.

I tried objdump on the executable s and well, looking through 200,000 lines of assembler is beyond my skill levell.

Brad, I built the compiler (the one with AsBits() working for param) with CHPL_LLVM=bundled.

I assumed I had the LLVM compiler but anyway, I set CHPL_TARGET_COMPILER=llvm.

chpl --savec tmp s.chpl
make -f tmp/Makefile

and it says

rm -f s
mv tmp/s tmp s

which is not very helpful.

Having -S is not a high priority but not having it is not productive.

These days my development cycle goes like

develop/debug code in Chapel code, say `t.chpl`
find crucial routines from `t.chpl`
manually recode them as `t-crucial-bits.c
repeat
   gcc11 -O3 -S -fno-math-errno -mfma t-crucial-bits.c
   ... review assembler and tweak C code
   clang14 -O3 -S  -ffp-exception-behaviour -frounding-math t-crucial-bits.c
   ... review assembler and tweak C code
until minimalist assembler achieved
refactor t-crucial-bits.c back into the Chapel code

Not an overly satisfactory cycle

I am a bit of a fan of McIlroy's negative coding approach, i.e. I want the smallest number of lines of assembler generated for a given amount of Chapel (or its C translation). The program which produces the smallest number of lines of assembler also generally tends to be the fastest. It is often the simplest to read and understand, if only from the perspective that, within a single rotuine, my eyes start to water beyond 50 lines of assembler and my brain shuts down at slightly over 100 lines of assembler.

It would make me lots more productive if I could just do

develop/debug code in Chapel code, say `t.chpl`
find crucial routines from `t.chpl`
cut and past them into `t-crucial-bits.chpl (probably only a single module)
create a driver to exercise the **proc**s in t-crucial-bits.chpl
repeat
   chpl -S -o t.s --ieee-float --fast t-crucial-bits.chpl
   view t.s
until minimalist assembler is achieved
cut and paste t-crucial-bits.chpl back into t.chpl

Looks a lot easier to me.

It would be good to have such a feature sometime.

David, thanks for your input. Once I get a reply from @vass or @bradcray which might point to the error of my ways (or thinking), I should be able to reply more intelligently to your post.

I'd just like to note that we already have an issue about this : Assembler Output of individual code chunks - Desirable / Low Priority · Issue #15043 · chapel-lang/chapel · GitHub

I've known how to do this for a while and Damian it sounds like it has been a thorn in your side so I went ahead and drafted something to do it. Please see Add support for displaying resulting assembly by mppf · Pull Request #21076 · chapel-lang/chapel · GitHub . I'd appreciate help in testing this PR (I prototyped it quickly but have not gotten to testing it more).

Here is an example

// bb.chpl
config const n = 10_000;
proc foo() {
  var result = 0.0;
  for i in 1..n {
    result += sqrt(i*i);
  }
  return result;
}

proc bar() {
  var total = 0.0;
  for i in 1..10 {
    total += foo();
  }
  return total;
}

writeln(foo());
writeln(bar());
$ chpl bb.chpl --fast  --llvm-print-ir foo,bar --llvm-print-ir-stage asm 
...
shows assembly for foo and bar
...

If you aren't able to try out that PR, you could run the commands manually:

$ chpl bb.chpl --fast  --llvm-print-ir foo,bar --llvm-print-ir-stage full --savec=tmp

... LLVM IR representation output you can ignore...

$ objdump --disassemble=foo_chpl tmp/chpl__module.o

... assembly output ...

Note that using --llvm-print-ir might be necessary to disable inlining for that symbol (otherwise objdump might not find it). Additionally, if the function isn't called, the Chapel compiler currently won't compile it, so you'll need the calls to it in your test program.

Thanks for this, especially for the fast turnaround. I will see what I need to do. I assume this means rebuilding the compiler? I am a total novice when it comes to a PR.

Yes you would check out the branch from the PR (or apply the changes manually if that is easier for you). The branch is here GitHub - mppf/chapel at resolve-15043 . You would indeed need to rebuild the compiler.

Hi @damianmoz

I'm catching up on mail and was curious whether you were ever able to give this a try. Now that 1.29.0 is out, this capability that Michael is implemented is available there in its current form.

-Brad

Sorry. I got distracted into helping others in the team with Chapel and loads of paperwork, the latter being my day-job (sadly). Hopefully next week.

No worries, just curious!

-Brad

Tested. It works. It looks quite useful as a comparison too.

Thanks heaps.

1 Like