Dumb questions

I hope you people don't mind but to shortcut some decisions I thought to pick some brains instead of slogging through documents and trial and error stuff. Apologies in advance.

First, I have a tool named flx which was originally designed to invoke the Felix compiler flxg which produces C++, then compile the generated C++, then find all the required libraries automatically, and link the final product (which could be an executable, shared library or archive), and run it. But it can also do native C and C++ (no Felix involved), and some Ocaml, and vague attempts have been made to integrate golang and other languages.

The tool has a few core features: first, you can write commands that are identical on Linux, Mac, and Windows and which do exactly the same thing on these platforms.

Secondly, the C++ compiler to be used is selected by selecting a plugin toolchain controller. Toolchains are not allowed to have semantic options. If you want different options, you have to use a different toolchain. As a result, toolchains have a fixed set of capabilities, no ifs/buts/configuration nightmares. A toolchain can compile to static object file, dynamic object file, build an archive library, link a shared library, link an executable, and check for compilation dependencies (header files) and that's all you're allowed to do.

Third, you do not specify search paths for either header files or for libraries to link with, nor are you allowed to specify the names of libraries. Instead, the language itself specifies abstract resources, and the tool uses another tool, flx_pkgconfig to query the configuration database of meta-data to find out what is required (it's similar to pkconfig but is a fully general database) So when you say to "run" a Felix program, it just runs. No arguments other than the file to run.

The tool does automatic dependency checking. No makefiles are required.

Also, the tool never writes anything into any place unless you tell it, except for a special cache directory. So when you run a program by specifying the source file all the binary crud is put in the cache. Unless you look there you will not find the executable.

So .. I want to add Chapel to the tools capabilities. For dependency specification I can program the Chapel toolchain to scan source code for special comments which specify requirements. This is built-in to Felix language. I use these special comments to allow autolinkage for C++.

Q1: what can/does the compiler generate? Is it only an executable?

Q2: Chapel has a way to lift C types and function. Does this work for C++ types and functions? Does it work, and if so how, if the LLVM backend is used?
Is there an in-languague way to specify abstract dependencies?

Just to explain here in Felix:

type intvector = "::std::vector<int>" 
   requires header "#include <vector>";
fun f(x:intvector)=> ...;

Now, if the function f is called, you must have an intvector value, so the intvector type is used, and so when Felix emits C++, it also emits the required #include directive. If it is not used, the #include is not emitted. You can also write:

  .. requires package "metadata" ..

and now, this causes the toolchain to look for metadata.fpc in the configuration database. That may contains specification to link some library. In C++ the requires clause is put in a comment

//@requires package "metadata" 

Unless Chapel has a similar feature, I need to do as for C++ and put the specs in a comment. It's not optimal because in Felix, only resources actually required are included when building. The toolchain actually has to scan the C++ file to find these special comments to establish the abstract resource name so as to find the resource meta-data definition in the configuration database. This should work for Chapel too I hope.

Q3: The tool also gets rid of the need to specify environment variables to run the compilers and/or the executables under its control. It uses the meta-data database to find what is required, and sets it all up for you. Of course, if you run an executable standalone and it needs, say, a Chapel related resource such as a shared library with run time support, it can't help you. If the default is not to your liking you can specify a build control package on the command line. For example:

flx --chapel=chplsetup helloworld.chpl

will find the chplsetup.fpc file and set environment variables, command line switches, etc to run the chapel compiler, and then run the program. I have to program flx to support this. So I need to know what is required.

Not at all, that's what we're here for (well, up to a point).

Q1: what can/does the compiler generate? Is it only an executable?

By default it simply generates an executable (or two, if you are compiling for distributed memory—one to launch the program, and one that is the per-node program). If you are using the C-based back-end, you can use --savec [directory-name] to capture the intermediate C code that we use as our portable assembly and a Makefile to build it. As of Chapel 1.25.0, the LLVM back-end has become the default, and I believe that the (now dated, name-wise) --savec flag has the effect of saving some sort of LLVM byte-code into the specified directory, but I haven't had much of a chance to kick that around, so am less familiar with it.

Q2: Chapel has a way to lift C types and function.

By "lift," do you mean call out to them / wrap them / make them available to Chapel programmers? If so, yes. One technique is to manually use extern declarations that use Chapel syntax to describe the C type, routine, or variable, where some white lies are permitted as long as they tell Chapel what it needs to know and the details can be resolved by the C-time compilation of the file. A second is to use our c2chapel tool which will take in a C header file and generate those extern declarations for you. The third is to embed the C declarations within the Chapel code itself using an extern { ... /* C declarations go here */ ... } block which utilizes clang to parse the C code and make the symbols available to the user. More can be learned about these options in the tech notes and language specification (where these sections are overdue to be merged, I believe).

Does this work for C++ types and functions?

No, though it would be nice if it did. Currently, we rely on wrapping any C++ code in C wrappers and calling to that.

Does it work, and if so how, if the LLVM backend is used?

Yes, what we have works and we rely on it heavily, both for bootstrapping the language and growing our set of libraries by wrapping existing ones to avoid reinventing the wheel. These features are fairly independent of the back-end, and are more about the front-end and IR. Though, as noted, clang is used for the third ("embed C code into my Chapel code") approach.

Is there an in-languague way to specify abstract dependencies?

There is a require statement that permits the code to say "I rely on this C header file" or "I rely on this library." Essentially, to provide the real definition of the external code in the approaches above (well, the ones that don't directly embed the code into Chapel anyway), you can specify .h, .c, .o, .a, ... files on the Chapel compiler's command-line, and when it comes time, it will compile and/or link those files into the generated program. The require statement is a means to move those requirements off of the command-line and into the code.

So transcribing your Felix example, and moving it away from C++ towards something more C-like, you might do something like:

require "vector.h";
extern type intvector = opaque;
extern proc f(x:intvector);
var iv: intvector;
f(iv);

where intvector would need to be the C identifier for the type and opaque says "I can't really describe this to you, Chapel, in a way that would be useful." (where other options are to declare an extern record that mimicked the C struct being described by specifying some or all of its fields and their types).

I should also mention that we have a separate tool mason that's a package manager whose design was heavily influenced by Rust. So another way to talk about dependencies and requirements and how source code fits together is through mason.

Q3: The tool also gets rid of the need to specify environment variables to run the compilers and/or the executables under its control.

Chapel has a lot of environment variables that govern its compilation (which can also be set or overridden using command-line flags), largely because supercomputers have a lot of variation and options for how they're deployed. These can be embedded in a chplconfig file to avoid having to establish them in each session, or to associate defaults with a particular Chapel installation.

Hope this is helpful,
-Brad

I couldn't find the require statement in the docs. Is it documented?

I don't understand how C types are used if you don't send them together with the definition from an include file to a C compiler. If you're using LLVM, you're bypassing the C compiler, aren't you? Or are you using clang to generate some IR the same as Chapel makes then merging it or what?

If you are using a C compiler somehow, it's trivial to change the compiler to c++. It should still process all the C stuff. The reason is clear, C++ can do lots of stuff that is hard in C. It can probably generate better code than LLVM.

When I looked at the compiler briefly I saw some calls to LLVM but I didn't do any extensive inspection. I understand you can easily tell LLVM about C function calls but types are another story. Values of types have to be copied, assigned, moved and destroyed and C++ handles all that. OTOH a C struct can include variables of other struct types you don't know the size of so you would need to recursively describe every C struct needed to Chapel (which still can't calculate the sizes).

Yes, though I seem to have mis-pasted the hyperlink in my previous response. I've edited the post on Discourse, but here's the correct one: C Interoperability — Chapel Documentation 1.32

For the C back-end, any .h file from a require statement or the command line gets added as an #include in the back-end code. For the LLVM back-end, as I understand it, we use clang to read the header file in order to give the compiler the C definitions as IR alongside the Chapel-generated IR.

If you are using a C compiler somehow, it's trivial to change the compiler to c++.

I think ingesting C++ and generating it are the easier parts. Teaching the Chapel compiler enough about C++ semantics to make it be able to use C++ and understand it well enough feel like the harder parts to me. We did a thought experiment about this once when I had convinced myself that it shouldn't be that bad, and I left feeling the opposite. I'm not finding my notes quickly, but will look again tomorrow, or else have others remind me of what I'm forgetting.

so you would need to recursively describe every C struct needed to Chapel

Generally speaking, Chapel doesn't need to know details of the memory layout of the C struct, which is one of the types of "white lies" I mentioned earlier. If a C struct S has fields x, y, and z of types int, double, and char*, respectively, and you only care about y in Chapel, you can tell Chapel:

extern record S {
  var y: real;  // real in Chapel is like double in C
}

and within the Chapel code, you will only be able to access the y field. But the other fields are still there and will be filled in by the C compiler and linker (for the C back-end) or clang parsing the header files (for the LLVM back-end) such that everything will be fine in terms of the memory layout and such. Chapel doesn't support a compile-time sizeof() for record types since we defer that to the back-end, so any such queries generate execution-time values (implemented using C-style sizeof()).

Signing off for the day,
-Brad

OK, thanks, found the require statement. Cool. [Discourse web interface on the other hand is extremely annoying. It keeps popping up stupid messages and forcing me to reload whilst editing]

The interaction of the require with params is nice. However the Felix machinery is better because it specifies an abstract resource. This is platform independent and also handles linking. The platform dependencies go into the meta-data data base which is used to translate them. I wonder if Chapel can be modified to allow an abstract specification?

Basically Chapel allows you to put command line switches in the program, like -lname or -Lpath and maybe -Ipath for header file searches, which are passed onto the compiler. This removes the need for specifying that stuff on the command line which is good, but it also means on every platform you have to edit the program, unless you used params, in which case you're back to the command line again.

The requirement is to autobuild code without any switches on the command line. What Felix does is, the compiler emits a list of abstract package names, and the build tool, grabs them and looks them up to find platform specific information like search paths, actual libraries to link, etc. This information is then passed on to the C++ compiler. So the model is: translate Felix to C++ and package list, lookup package list, call C++ compiler/linker etc using the relevant information from the package database. When building C or C++, the tool scans the files for the magic comments containing the requirements first.

I wonder how to make this work in Chapel? If it was only generating C, and the front end C generator was split from the C compilation step, it would be easy, I could do it externally. But if Chapel is going straight through the front end to the backend in a single uninterrupted process, it would have to be done by Chapel itself. Hmm. Or, if the language was modified to allow

require package "name";

which Chapel itself ignored, I could scan the source files for that statement, and get the abstract names that way.

I note, typically, the hassles are usually with Windows. However there are other issues, for example, my system always uses distinct names for static link and dynamic link object files and libraries due to design faults in unix tool command line switches like -lname which can find the wrong kind of file. So if you can both static and dynamic link, an embedded -lname switch cannot work because it cannot cover both cases. Using a param fixes that of course but then we're back to the command line again.

For the C back-end, any .h file from a require statement or the command line gets added as an #include in the back-end code. For the LLVM back-end, as I understand it, we use clang to read the header file in order to give the compiler the C definitions as IR alongside the Chapel-generated IR.

Ok, that makes sense.

On C++ semantics, what Felix does is simple: It mandates all C++ types are semi-regular, that is, they must have default constructors, copy initialisation, copy assignment, and destructors. In other words they're required to be first class types like an int. In particular, Felix never initialises variables, it always assigns them, like C, so assignment of a default initialised variable is required to be the same as initialising the variable with the same value, end of story.

If your C++ type does not meet these requirements, and a lot of useful types do not, then you must use a pointer instead. Pointers are first class so meet the requirements. You can hide the fact it's a pointer, which I do all the time. For example I use Google RE2 for regular expressions, and it is not a mobile object, so I just use C++ shared_ptr<RE2> instead. That's default initialised to point to nothing. It also automatically handles deletion.

The thing is you don't have to directly interface to all possible C++ object types. If you just require the type to be "like a C type" as I did, then you can do almost everything you need to. [I actually find this a better method than using say LLVM but I'll have to live with the status quo on that I guess :]

Yeah this is more in the realm of what mason does - have a look at Mason — Chapel Documentation 1.32 . One of the things we are wrestling with is that we might want to allow non-mason programs to depend on mason packages. In particular, we would like to describe some of the standard library as mason packages (in large part to handle dependencies on C libraries), but we would still like programs compiled with chpl to be able to use these.

I think when our user community expresses wishes for C++ interoperability, they are asking for full interoperability, say to class hierarchies that are dynamically dispatched and the like. But point taken that there are baby steps that could be taken which would avoid the need to manually wrap C++ code in C.

-Brad

Thanks for link. Yes, even your TOML file is almost the same idea as my flx_pkgconfig files except it uses field=string whereas I'm using field:string. I also use meta-data to build the whole of my run time (which is C++), and all my tools (except the compiler). However my system does less than Mason it seems: it isn't a package manager. If the user wants to use a third party library, they create the meta-data after manually installing it.

I also use a literate programming tool which extracts files from a document and installs them. Since Felix autobuilds Felix programs, installing Felix code is just a matter of dumping the Felix files in the right directories (there's nothing to compile, Felix executes script directly like Python does, all the compilation and linkage is entirely transparent and fully automated).

Interestingly, instead of using a document generator like chpldoc, I have written a webserver which displays the literate package files directly, so there's nothing to generate .. more precisely the web server does it on the fly. I have some Read-The-Docs files hand written as well but my system has extra demands, including the requirement to display LaTeX symbols correctly, and to display includes as hyperlinks, show mouseovers for the LaTeX symbols, etc.

Mason seems pretty nice though.

I think you should forgive the ignorance of your user community wanting to get full C++ interoperability :slight_smile:

Object orientation is crap, so there's no need to interface to it. It suffices to use a C like interface to use the C++ class stuff if you find anything actually useful using it. Modern C++ doesn't use OO, it is heavily based on templates. You also do not want to interface entirely to that because it is also crap. Just use the good bits!

Chapel already has pretty much the same as Felix does in terms of interfacing to C++: you need to lift types and functions from C++ into your language which you can already do, you just aren't calling a C++ compiler. Chapel also has a requires clause which has exactly the same intent as Felix. The Felix system is a bit more advanced because it can specify abstract resources, and, it triggers dependences based on actual usage i.e. if you used a C/C++ type or function which you lifted, it will include the header files and link the library required based on the meta-data without command line instructions. If you didn't use it, no #includes or library is linked.

There's a challenge to get Chapel to do what Felix can with C++, for example this is trivial in Felix:

type vector[T] = "::std::vector<?1>" 
  requires Cxx89_header::vector;

because Felix generates C++. The ?1 annotation means "the first type parameter". Doing this level of integration in Chapel is much harder because it either generates C, or LLVM directly: Felix is C++ ABI compliant, it was designed from the ground up as a C++ code generator precisely so as to leverage existing C++ code. So unlike other languages with FFIs (Foreign Function Interfaces) including Chapel, there is no "mapping" required for data types: the only mapping is translating the C++ type into Felix types which is compile time only. Actually Felix also has ways to specify external C structs, just like Chapel. So the designs there are pretty close.

By comparison, mapping C libraries into Ocaml is possible but it's a lot of work.

But for C++ classes, you just use a C like interface binding. It's good enough to use the classes written in C++, the C++ is providing any dynamic dispatch. In Felix if you want dynamic dispatch you do it the same way as in C, using function pointers (well actually Felix has first class functions, meaning, closures, so it's a lot more capable than a C function pointer).

If you really want to integrate better to C++, use C++ compiler instead of C, and, drop LLVM. There's a price .. C++ can take a LONG time to compile. In Felix, the Felix compiler is about 3 orders of magnitude more sophisticated than C++, but only takes 1/3 of the total compile time: the C++ compilation step is heavily optimised to only use things that are actually required but it still takes 2/3 of the total compile time (unless you specify -O0 in which case the run time is 10 times slower).

Anyhow, I think Chapel is already doing a pretty good job binding to C and a command line option could be tested where it just uses C++ compiler (i.e. clang++ instead of clang). In fact they're the same compiler anyhow, just a few switches different. All modern C/C++ compilers are just that: they do C++ with a few tweaks to handle C as well. Heck, i had a look at an 8 bit microcontroller (Arduino) and you program that in C++ with GNU toolchain.

New dumb question. I am confused about references. Are references actually types?

For classes, Chapel uses a pointer it calls a reference. So passing a class around is just a pointer, that's fine and I don't think that's an issue.

Also the manual says you can have reference variables. If the two variables are in the same scope, there is no run time or semantics involved: it's just two names for the same variable. However, if you're referring to a component of some data structure, then its actually a pointer pretending to be a reference, and that may introduce some serious problems whilst simultaneously trying to solve others.

My concern stems from the observation that whilst lvalue/rvalue idea works, just barely, in C, because it is purely syntactic, in C++ reference types completely destroy the type system. I would not like the same problem to occur in Chapel.

References types destroy the type system when you try to introduce parametric polymorphism. Type constructors must be parametric. For example, T* in C is parametric for data types, if T is a type, T* is also a type, and a distinct type. For tuples, if A, B are types, A * B is also a type, and a distinct type. In C++, if T is a type, T& may or may not be a distinct type, therefore, polymorphism cannot work.

The intention isn't to have reference types but rather that some variables/formal arguments are references. This is reflected in the syntax: we write ref x = f() to set up a variable that is a reference but var y: int to set up a variable while specifying its type. The type system does sometimes use the kind of intent/variable kind information but it's nowhere like the situation in C++.

Yes, it seems carefully considered to avoid the C++ blunder. But I'm not totally convinced it works yet. Can you have a reference member in a class object that refers to something outside the class (other than a class object I mean)?

I want to present a type system calculus which is heavily tested (meaning I've already implemented it) and based on the kinds of algebras used in research papers (so it's based on category theory). However there are some novel extensions which power up "homogenous tuples" enormously. It could maybe also apply to your arrays. The origins are in Barry Jay's FISh which supports polyadic (rank independent) array programming.

For me the interesting part is then to add some new operations to the calculus which probably "partition" index sets to support parallel computations. I note Chapel already does some of this, for example "promoting" a function from an element to apply to array is precisely what a an array functor is required to do. In other languages this is called "map". In Haskell, the typeclass Functor is defined to have a member "fmap". I'll need a new topic. The idea extends to all functors but for Chapel array is the important one because arrays have random access and so map can be done in parallel.

Reduce is related but heavier. You only have a fixed set of reductor functions but any suitable function T * T -> T which is associative and symmetric should work. however you actually want a chain of functions T1 * T2 -> T2 for first level accumulation and T2 * T3 -> T3 for the second level, etc to aggregate the results up a tree.

I also did some work recently on a system where could do some operations like "select" on a database. Now, you can do the full select, or you can do a partial select, send the results upstream, and the rest of the filtering upstream. The problem now becomes, given a calculus which allows a calculation to be partitioned like this, to add costing annotations, say, money, for processing on various stages of platform, and network transmission costs, and figure out the optimal place to put the splits, based on the cost. [The cost could be real time or even some kind of polynomial function] The key market there was gene sequencing. Chapel could do that too.

Not today, though there are frequent requests for this. For example "I have a huge array, and I want to have an object refer to it, but I don't want to literally copy it into the object as a field; can't I just have a ref field that refers to it?" The answer today is "no", but this seems like an obviously attractive feature. The workaround people use today is to create a class to wrap such arrays, but this seems like a hassle and the fact that so many people have written the pattern feels unsettling.

There's some discussion of this here: Timeline for ref Fields in Classes/Records? · Issue #8481 · chapel-lang/chapel · GitHub, though I feel as though there's been additional discussion about it since then that I'm not finding at the moment because I know that Michael has warmed up to it more than he was in that issue (primarily because of the introduction of lifetime analysis into the compiler to help with owned/borrowed).

I want to present a type system calculus

I want to warn that I'm personally not likely to pay much attention to this because I'm not particularly good at formal PL systems and am generally very short on time. Don't take that as a judgement of the idea or a negative review, just the facts of my work-life these days.

-Brad

Thanks for link. The calculus doesn't require any type theory, only high school maths. The only tricky bit is most people don't know what a variant type is. But it's just a discriminated union. Every C programmers know how to make a union with tag value to say which component is stored, that's a manually built variant, where chosing the right union component is a matter of discipline rather than having the compiler enforce it.

1 Like