New Issue: I/O: Encoder/Decoder proposal

18499, "mppf", "I/O: Encoder/Decoder proposal", "2021-09-30T22:00:14Z"

This issue describes an Encoder/Decoder strategy that is meant to address #7846. In particular, we want to have an Encoder/Decoder mechanism that:

  • is user-extensible
  • makes it clear how to make output that can be read back in (#7952)
  • supports serializing/deserializing object graphs with cycles (#7951)

Along with adding this functionality, this issue is proposing to deprecate the %h and %j format string support since these will be handled by Encoder/Decoder subclasses.

The design here is that Encoder and Decoder are parent abstract classes. Subclasses will implement particular formats, so there will be e.g. a JSONEncoder and JSONDecoder.

Encoder/Decoder Parent Class

Here are the proposed parent classes forming the interfaces:

class Encoder {
	// Returns the underlying writer channel 
	proc writer; 
	// Encodes something, with optional key
	proc encode(key:?t = none, value) : void throws;
}
class Decoder {
	// Returns the underlying reader channel 
	proc reader; 
	// Decodes something of type valueType. Must match key if specified.
	// Throws EndOfIoGroupError if there is no more input in the current object.
 	proc decode(key:?t = none, type valueType) : valueType throws;
	// Reads the next key from the current object which can then be read with a ‘decode’ call
	// Throws EndOfIoGroupError if there is no more input in the current object.
	proc decodeNextKey(type keyType): keyType throws;
}

Examples

Here is a simple example of using an Encoder to output the JSON representation of a record:

record Employee {
	var name: string;
	var id: int;
}

proc Employee.encodeTo(encoder: borrowed Encoder) throws {
	encoder.encode(key="name", name);
	encoder.encode(key="id", id);
}

var encoder = new JSONEncoder(…);
var bob = new Employee("Bob", 1);
encoder.encode(bob);  // {"name": "Bob", "id": 1}

Similarly, here is an example showing JSON decoding:

record Employee {
	var name: string;
	var id: int;
}

proc Employee.init(decoder: borrowed Decoder) throws {
	this.name = decoder.decode(key="name", string);
	this.id = decoder.decode(key = "id", int);
}

var decoder = new JSONDecoder(…);
var x = decoder.decode(Employee);

The following example shows how encoding works with a nested type:

record Employee {
  var name: string;
  var status: Status;
}
record Status {
  var employed: bool;
  var id: list(int);
}
proc Employee.encodeTo(encoder: borrowed Encoder) throws {
  encoder.encode(key="name", name);
  encoder.encode(key="status", status); // eventually calls Status.encodeTo
}
proc Status.encodeTo(encoder: borrowed Encoder) throws {
  encoder.encode(key="employed", employed);
  encoder.encode(key="id", id);
}

proc Employee.init(decoder: borrowed Decoder) throws {
  this.name = decoder.decode(key="name", string);
  this.status = decoder.decode(key="status", Status); // eventually calls Status.init(Decoder)
}
proc Status.init(decoder: borrowed Decoder) throws {
	this.employed = decoder.decode(key="employed", bool);
	this.id = decoder.decode(key="id", list(int));
}

Reading Fields Out-of-Order

The decoder design allows for reading a value associated with a particular key. The idea is that the decoder will use information about the file format to be able to read ahead and find the region of the file corresponding to that key. This work is not necessary if the requested key matches the next value in the input. When reordering is necessary, the Decoder subclass can use mechanisms like the channel mark/rewind and create a map of keys to file offsets.

Compiler-generation of encodeTo/init(Decoder)

Today, the Chapel compiler generates readThis and writeThis methods automatically when they are not provided by a type author. We could do the same thing for encodeTo and init(Decoder).

Classes vs Interfaces

This proposal uses parent classes but interfaces are a reasonable alternative. The reason to use parent classes is that interfaces aren't complete yet and because using classes allows the encodeTo / init(Decoder) functions to be concrete (so leads to fewer instantiations). Classes have the drawback of virtual dispatch overhead, but this is not expected to be a major issue in the context of I/O.

Open Questions

  • TODO -- see other issue -- Should ‘write(myCustomRecord)’ call ‘encodeTo’ with some kind of default encoder?
    • In other words, how does ‘read’ / ‘write’ on a channel interact with the Encoder / Decoder?
    • Probably need something like this for ‘writeln(myrecord)’ to work
    • Could/should we deprecate writeThis/readThis/readWriteThis?
      • A type could specify which fields are to be encoded in 'encodeTo'
      • A type could use tertiary methods on a DefaultEncoder to fully customize output