New Issue: What are the expectations from the 'list' data structure?

18095, "e-kayrakli", "What are the expectations from the 'list' data structure?", "2021-07-22T23:21:35Z"

Our current list module was designed based on Python's `list. The historical context in which it was added was when we wanted to stop supporting arrays-as-vec, and wanted a replacement for it.

As a quick summary, our list is implemented as a growing array of arrays, implemented using ddata. This structure supports O(1) insertion and removal at the end, but anywhere else, they require shifting which is O(n) operation in the worst case. You can opt-in for parallel-safety, but then some methods cause compilation errors.

Based on this design and some specific questions, there were some design questions raised in today's module review meeting that begged the question "what do we want the list structure to be?". These questions are

  • The name list suggests O(1) insertion and removal, which is not the case for our implementation. So, is list the right name?
  • Should parallel safety be controlled by a flag, or a parallel-safe list is a different data structure?
    • We've based our design on python's list, but we have much different concerns than Python, of which parallel-safety is one.
  • What do do with pop et al: Reconsider `List.pop()` family of routines · Issue #16194 · chapel-lang/chapel · GitHub
    • The name pop intimates that this is a stack, but it isn't.
    • Should we also add push?
    • What about enqueue/dequeue, then?
  • How can we accommodate different list-like data structures?
    • We have vector and unrolledLinkedList with different performance implications. Are they "list"s?

All these questions led us to re-think what we want the list data structure to be. Do we want;

  1. non-parallel-safe data structure with O(1) insertion and removal
    This is what "list" means for most, but we don't have O(1) insertion and removal at arbitrary locations.
  2. an analogue for Python's list
    We have parallel-safety that complicates matters. It is not clear whether what list is in Python (there's no array type in Python, and list is as close as you can get) represents what we want our list to be. Some of the inconsistencies we have (e.g. pop but no push) are things we inherit from python that we hope to change.
  3. a (optionally) parallel-safe data structure that supports indexing.
    This is what we have today, in my view. The implementation is esoteric and hard to describe in a way that users can argue about complexities of different operations.

(where these are not necessarily mutually exclusive set of options, nor is this an exhaustive list)

Some of the more in-detail design discussions gets stuck because it seems like we're trying to attribute so many properties on a single data structure. Or we don't have a clear vision for what we want the list type to be. This issue is to collect some opinions as what people expect from this data strucuture.