18095, "e-kayrakli", "What are the expectations from the 'list' data structure?", "2021-07-22T23:21:35Z"
Our current list
module was designed based on Python's `list. The historical context in which it was added was when we wanted to stop supporting arrays-as-vec, and wanted a replacement for it.
As a quick summary, our list
is implemented as a growing array of arrays, implemented using ddata. This structure supports O(1) insertion and removal at the end, but anywhere else, they require shifting which is O(n) operation in the worst case. You can opt-in for parallel-safety, but then some methods cause compilation errors.
Based on this design and some specific questions, there were some design questions raised in today's module review meeting that begged the question "what do we want the list
structure to be?". These questions are
- The name
list
suggests O(1) insertion and removal, which is not the case for our implementation. So, islist
the right name? - Should parallel safety be controlled by a flag, or a parallel-safe list is a different data structure?
- We've based our design on python's
list
, but we have much different concerns than Python, of which parallel-safety is one.
- We've based our design on python's
- What do do with
pop
et al: Reconsider `List.pop()` family of routines · Issue #16194 · chapel-lang/chapel · GitHub- The name
pop
intimates that this is a stack, but it isn't. - Should we also add
push
? - What about
enqueue
/dequeue
, then?
- The name
- How can we accommodate different list-like data structures?
- We have
vector
andunrolledLinkedList
with different performance implications. Are they "list"s?
- We have
All these questions led us to re-think what we want the list
data structure to be. Do we want;
- non-parallel-safe data structure with O(1) insertion and removal
This is what "list" means for most, but we don't have O(1) insertion and removal at arbitrary locations. - an analogue for Python's
list
We have parallel-safety that complicates matters. It is not clear whether whatlist
is in Python (there's no array type in Python, andlist
is as close as you can get) represents what we want ourlist
to be. Some of the inconsistencies we have (e.g.pop
but nopush
) are things we inherit from python that we hope to change. - a (optionally) parallel-safe data structure that supports indexing.
This is what we have today, in my view. The implementation is esoteric and hard to describe in a way that users can argue about complexities of different operations.
(where these are not necessarily mutually exclusive set of options, nor is this an exhaustive list)
Some of the more in-detail design discussions gets stuck because it seems like we're trying to attribute so many properties on a single data structure. Or we don't have a clear vision for what we want the list
type to be. This issue is to collect some opinions as what people expect from this data strucuture.