-
Notifications
You must be signed in to change notification settings - Fork 91
Proposals vector
This is a proposal for the 'vector' package to be included in the next major release of the Haskell platform.
Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages.
Review comments should be sent to the libraries mailing list before 1 October 2012, which is the discussion deadline.
Proposal author: Johan Tibell
Package maintainer: Roman Leshchinskiy
The following individuals contributed to the review process:
- Henning Thielemann
- Bryan O'Sullivan
- Bas Van Dijk
- Ryan Newton
- Simon Marlow
- Simon Peyton-Jones
- Thomas Schilling
- Gábor Lehel
- Gregory Collins
- Yitzchak Gale
- Brandon Allbery
- Mark Lentczner
- Antoine Latter
The `vector` package is an efficient implementation of `Int`-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework. It supports both boxed and unboxed data, the latter having a more efficient implementation.
Documentation and tarball from the hackage page:
http://hackage.haskell.org/package/vector
Development repo:
darcs get http://code.haskell.org/vector/
Vectors are ubiquitous in programming. They have great cache locality and thus provide both very efficient bulk operations and indexing, outperforming regular Haskell lists in many cases. Vectors are also useful low-level building blocks. For example, the `unordered-containers` package uses vectors to implement the hash array mapped trie used internally to implement e.g. `Data.HashMap`.
The Haskell Platform already provides a vector package, `array`, but many users prefer the `vector` package, due to
- having a more modern and comprehensive API (similar to `bytestring` and `text`), and
- a more efficient implementation.
In addition, while the `Array` type's indexing mechanism is more general, it's slower and can be a nuisance as most algorithms on vectors assume zero-based, `Int`-indexed vectors.
The platform needs a common vector type that other packages can use in their APIs. I argue that `Vector` is a better candidate than `Array` and that we should therefor bless it by moving it into the platform.
The API is structured as follows:
- `Data.Vector` - Boxed vectors of arbitrary types.
- `Data.Vector.Unboxed` - Unboxed vectors with an adaptive representation based on data type families.
- `Data.Vector.Storable` - Unboxed vectors of `Storable` types.
- `Data.Vector.Primitive` - Unboxed vectors of primitive types as defined by the `primitive` package. `Data.Vector.Unboxed` is more flexible at no performance cost.
- `Data.Vector.Generic` - Generic interface to the vector types.
The API has a traditional list-like API for immutable vectors and an imperative API for mutable vectors. The full reference documentation is available on Hackage. There's also an introductory tutorial on the Haskell wiki.
Immutable vectors are processed using bulk operations, such as maps, filters, and folds and not using the explicitly recursive approach often used by lists. This is due to the O(n) cost of `cons`. Incremental construction of immutable vectors is done by mutating a mutable vector and then freezing it, converting it into an immutable vector. Mutation is done either in `ST` or `IO`. The `primitive` package, which vector depends on, allows code to abstract over the underlying monad used in imperative code.
There are a number of higher level algorithms on vector defined in the vector-algorithms package (not proposed for inclusion at this point.)
- Uses a list-like API, which should be very familiar to Haskell programmers.
- Uses type families to get more efficient representations of unpackable data (e.g. primitive values, such as `Int`, and records thereof.)
- Abstracts over the monad in which algorithms working on immutable data runs in. This allows users to e.g. use mutable vectors in `IO` without having to litter their code with `stToIO` calls.
- The `vector` package depends on the `primitive` package, which is a thin wrapper over the `Array#` and `MutableArray#` operations provided by GHC. The package is somewhat of an implementation detail, but it is is independently useful as it provides a monad that abstracts over `IO` and `ST`, which would be needed for e.g. an hash table implementation in the `containers` package. This package would also have to be included in the platform as well as there's currently no way to hide it. -- Johan
Decision: While the package name is maybe not ideal, we will include the package as is. We will not widely advertise its inclusion (it's mostly an implementation detail).
- The package has a large number (11) of `.Safe` modules, for the benefit of the SafeHaskell language feature. These modules cluttered the API and risk to confuse new users (who wouldn't want to use the API in a "safe" way?). I suggest they be moved into their own package that users of SafeHaskell can rely on. -- Johan
Decision: We will include a version of vector without the `.Safe` modules. The platform as a whole will decide on SafeHaskell's place in the platform separately, some time in the future.